[Mesa-dev] [PATCH V2] mesa: Permanently enable features supported by target CPU at compile time.
This will remove the need for unnecessary runtime checks for CPU features if already supported by target CPU, resulting in smaller and less branchy code. V2: - Removed the SSSE3 related part for the not yet merged patch. - Avoiding redefinition of macros. --- src/mesa/x86/common_x86_features.h | 26 ++ 1 file changed, 26 insertions(+) diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..65634aa 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -59,13 +59,39 @@ #define X86_CPUEXT_3DNOW_EXT (1<<30) #define X86_CPUEXT_3DNOW (1<<31) +#ifdef __MMX__ +#define cpu_has_mmx1 +#else #define cpu_has_mmx(_mesa_x86_cpu_features & X86_FEATURE_MMX) +#endif + #define cpu_has_mmxext (_mesa_x86_cpu_features & X86_FEATURE_MMXEXT) + +#ifdef __SSE__ +#define cpu_has_xmm1 +#else #define cpu_has_xmm(_mesa_x86_cpu_features & X86_FEATURE_XMM) +#endif + +#ifdef __SSE2__ +#define cpu_has_xmm2 1 +#else #define cpu_has_xmm2 (_mesa_x86_cpu_features & X86_FEATURE_XMM2) +#endif + +#ifdef __3dNOW__ +#define cpu_has_3dnow 1 +#else #define cpu_has_3dnow (_mesa_x86_cpu_features & X86_FEATURE_3DNOW) +#endif + #define cpu_has_3dnowext (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT) + +#ifdef __SSE4_1__ +#define cpu_has_sse4_1 1 +#else #define cpu_has_sse4_1 (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1) +#endif #endif -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] egl_dri2: Allow both 24 and 32 bit X visuals for RGBA configs
On Fri, 07 Nov 2014 11:32:04 -0800 Eric Anholt wrote: > Pekka Paalanen writes: > > > On Thu, 06 Nov 2014 13:01:03 -0800 > > Ian Romanick wrote: > > > >> I thought Eric and Chad already NAKed it in bugzilla. The problem is > >> that applications ask for an RGBA visual for GL blending. They use the > >> alpha channel to generate their images, but the final alpha values are, > >> basically, random... and the composited result would be pure garbage. > > > > Reading > > https://bugs.freedesktop.org/show_bug.cgi?id=67676#c5 > > "We should certainly be exposing EGLConfigs that match up to the rgba > > visual, though, so you can find it when you try." - Eric > > > > To me that sounds like Eric would accept having the visuals there > > in additional configs (as long as they are sorted after the otherwise > > equivalent xRGB configs?). Eric, would you like to confirm your current > > opinion? > > What I believe we want: > > Somebody just requesting RGBA with ChooseConfig doesn't end up forced > into the depth-32 (blending) X visual. This is the most important. > > There is some mechanism for somebody that does want the depth 32 visual > to get an EGL config to draw to it. This is important but secondary to > not breaking everything else, and them having to jump through hoops is > reasonable but probably avoidable. I think that is exactly what everybody already agrees on. The remaining question seems to be, should we add new configs with the blending X visual, or wait for an EGL extension to allow to request blending in generic terms (*and* add configs with the blending X visual, since that is essentially required for making, say, a new value for EGL_TRANSPARENT_TYPE to work on X11). Can you imagine other reasonable mechanisms? Btw. I noticed that EGL_TRANSPARENT_TYPE defaults to EGL_NONE for eglChooseConfig, not DONT_CARE (which is not even allowed in EGL 1.4). So if we add only configs now, and add the EGL_TRANSPARENT_TYPE=alpha extension later, apps using eglChooseConfig for initial config filtering and wanting a blending config will be broken yet again. I suppose one might even claim, that exposing bleding configs when EGL_TRANSPARENT_TYPE=NONE is a violation of the spirit of the spec. EGL 1.4 says that EGL_TRANSPARENT_TYPE is not a sorting key at all, but NATIVE_VISUAL_ID is with an implementation specified order. Thanks, pq ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 85419] [llvmpipe] Assertion fail with triangle strips
https://bugs.freedesktop.org/show_bug.cgi?id=85419 José Fonseca changed: What|Removed |Added CC||jfons...@vmware.com Summary|Assertion fail with |[llvmpipe] Assertion fail |triangle strips |with triangle strips -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 1/3] mesa: add runtime support for SSSE3
V3: - remove flag check from config V2: - remove unrequired #ifdef bit_SSSE3 - order flag check in config Signed-off-by: Timothy Arceri --- src/mesa/x86/common_x86.c | 2 ++ src/mesa/x86/common_x86_features.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/x86/common_x86.c b/src/mesa/x86/common_x86.c index 25f5c40..bef9cf2 100644 --- a/src/mesa/x86/common_x86.c +++ b/src/mesa/x86/common_x86.c @@ -352,6 +352,8 @@ _mesa_get_x86_features(void) __get_cpuid(1, &eax, &ebx, &ecx, &edx); + if (ecx & bit_SSSE3) + _mesa_x86_cpu_features |= X86_FEATURE_SSSE3; if (ecx & bit_SSE4_1) _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1; } diff --git a/src/mesa/x86/common_x86_features.h b/src/mesa/x86/common_x86_features.h index 66f2cf6..6eb2b38 100644 --- a/src/mesa/x86/common_x86_features.h +++ b/src/mesa/x86/common_x86_features.h @@ -43,7 +43,8 @@ #define X86_FEATURE_XMM2 (1<<6) #define X86_FEATURE_3DNOWEXT (1<<7) #define X86_FEATURE_3DNOW (1<<8) -#define X86_FEATURE_SSE4_1 (1<<9) +#define X86_FEATURE_SSSE3 (1<<9) +#define X86_FEATURE_SSE4_1 (1<<10) /* standard X86 CPU features */ #define X86_CPU_FPU(1<<0) @@ -65,6 +66,7 @@ #define cpu_has_xmm2 (_mesa_x86_cpu_features & X86_FEATURE_XMM2) #define cpu_has_3dnow (_mesa_x86_cpu_features & X86_FEATURE_3DNOW) #define cpu_has_3dnowext (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT) +#define cpu_has_ssse3 (_mesa_x86_cpu_features & X86_FEATURE_SSSE3) #define cpu_has_sse4_1 (_mesa_x86_cpu_features & X86_FEATURE_SSE4_1) #endif -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 3/3] i965: add runtime check for SSSE3 rgba8_copy
Callgrind cpu usage results from pts benchmarks: For ytile_copy_faster() Nexuiz 1.6.1: 2.48% -> 0.97% V3: - rather than putting the ssse3 code in a different file in order to compile make use of gcc pragma for per function optimisations. Results in improved performace and less impact on those not needing runtime ssse3 checks. V2: - put back the if statements and add one for the SSSE3 rgba8_copy - move some header files out of the header - don't indent the preprocessor tests - changed copyright to Google and add author Frank Henigman Signed-off-by: Timothy Arceri --- src/mesa/drivers/dri/i965/intel_tex_subimage.c | 88 +- 1 file changed, 73 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_tex_subimage.c b/src/mesa/drivers/dri/i965/intel_tex_subimage.c index cb5738a..c6eda5c 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_subimage.c +++ b/src/mesa/drivers/dri/i965/intel_tex_subimage.c @@ -42,8 +42,13 @@ #include "intel_mipmap_tree.h" #include "intel_blit.h" -#ifdef __SSSE3__ +#include "x86/common_x86_asm.h" +#include "x86/x86_function_opt.h" + +#if defined(SSSE3_FUNC_OPT_START) +SSSE3_FUNC_OPT_START #include +SSSE3_FUNC_OPT_END #endif #define FILE_DEBUG_FLAG DEBUG_TEXTURE @@ -175,7 +180,8 @@ err: return false; } -#ifdef __SSSE3__ +#if defined(SSSE3_FUNC_OPT_START) +SSSE3_FUNC_OPT_START static const uint8_t rgba8_permutation[16] = { 2,1,0,3, 6,5,4,7, 10,9,8,11, 14,13,12,15 }; @@ -185,24 +191,18 @@ static const uint8_t rgba8_permutation[16] = (__m128i) _mm_loadu_ps((float *)(src)), \ *(__m128i *) rgba8_permutation\ ) -#endif -/** - * Copy RGBA to BGRA - swap R and B. +/* Fast copying for tile spans. + * + * As long as the destination texture is 16 aligned, + * any 16 or 64 spans we get here should also be 16 aligned. */ static inline void * -rgba8_copy(void *dst, const void *src, size_t bytes) +ssse3_fast_rgba8_copy(void *dst, const void *src, size_t bytes) { uint8_t *d = dst; uint8_t const *s = src; -#ifdef __SSSE3__ - /* Fast copying for tile spans. -* -* As long as the destination texture is 16 aligned, -* any 16 or 64 spans we get here should also be 16 aligned. -*/ - if (bytes == 16) { assert(!(((uintptr_t)dst) & 0xf)); rgba8_copy_16(d+ 0, s+ 0); @@ -217,8 +217,30 @@ rgba8_copy(void *dst, const void *src, size_t bytes) rgba8_copy_16(d+48, s+48); return dst; } + + while (bytes >= 4) { + d[0] = s[2]; + d[1] = s[1]; + d[2] = s[0]; + d[3] = s[3]; + d += 4; + s += 4; + bytes -= 4; + } + return dst; +} +SSSE3_FUNC_OPT_END #endif +/** + * Copy RGBA to BGRA - swap R and B. + */ +static inline void * +rgba8_copy(void *dst, const void *src, size_t bytes) +{ + uint8_t *d = dst; + uint8_t const *s = src; + while (bytes >= 4) { d[0] = s[2]; d[1] = s[1]; @@ -355,6 +377,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, if (mem_copy == memcpy) return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, dst, src, src_pitch, swizzle_bit, memcpy); + #if defined(SSSE3_FUNC_OPT_START) + else if (mem_copy == ssse3_fast_rgba8_copy) + return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, + dst, src, src_pitch, swizzle_bit, + ssse3_fast_rgba8_copy); + #endif else if (mem_copy == rgba8_copy) return xtile_copy(0, 0, xtile_width, xtile_width, 0, xtile_height, dst, src, src_pitch, swizzle_bit, rgba8_copy); @@ -362,6 +390,12 @@ xtile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, if (mem_copy == memcpy) return xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, memcpy); + #if defined(SSSE3_FUNC_OPT_START) + else if (mem_copy == ssse3_fast_rgba8_copy) + return xtile_copy(x0, x1, x2, x3, y0, y1, + dst, src, src_pitch, swizzle_bit, + ssse3_fast_rgba8_copy); + #endif else if (mem_copy == rgba8_copy) return xtile_copy(x0, x1, x2, x3, y0, y1, dst, src, src_pitch, swizzle_bit, rgba8_copy); @@ -391,6 +425,12 @@ ytile_copy_faster(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, if (mem_copy == memcpy) return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height, dst, src, src_pitch, swizzle_bit, memcpy); + #if defined(SSSE3_FUNC_OPT_START) + else if (mem_copy == ssse3_fast_rgba8_copy) + return ytile_copy(0, 0, ytile_width, ytile_width, 0, ytile_height, + dst, src, src_pitch, swizzle_bit, + ssse3_fast_rgba8_copy); + #endif else if (mem_copy == rgba
[Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
Signed-off-by: Timothy Arceri --- src/mesa/x86/x86_function_opt.h | 42 + 1 file changed, 42 insertions(+) create mode 100644 src/mesa/x86/x86_function_opt.h Using a macro like this means we can easily enable runtime support in clang once it also supports it. Also its less of an impact for those compiling with the optimisations enabled. Finally I'm assuming its also better for lto. diff --git a/src/mesa/x86/x86_function_opt.h b/src/mesa/x86/x86_function_opt.h new file mode 100644 index 000..c1ffb19 --- /dev/null +++ b/src/mesa/x86/x86_function_opt.h @@ -0,0 +1,42 @@ +/* + * Copyright (C) Timothy Arceri + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included + * in all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + * Author: + *Timothy Arceri + * + */ + +/* + * Helper macros to enable per function optimisations + * + */ + +#ifdef __SSSE3__ + #define SSSE3_FUNC_OPT_START + #define SSSE3_FUNC_OPT_END +#else + #if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 3)) + #define SSSE3_FUNC_OPT_START _Pragma("GCC push_options") \ + _Pragma("GCC target(\"ssse3\")") + #define SSSE3_FUNC_OPT_END _Pragma("GCC pop_options") + #endif +#endif -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
I rather to not use compiler specific hacks in mesa. If it was a personal pet project it would make sense. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa 10.3.3
Mesa 10.3.3 has been released. Mesa 10.3.3 is a bug fix release fixing bugs since the 10.3.2 release, (see below for a list of changes). The tag in the git repository for Mesa 10.3.3 is 'mesa-10.3.3'. Mesa 10.3.3 is available for download at ftp://freedesktop.org/pub/mesa/10.3.3/ SHA-256 checksums (can be verified with the sha256sum program): 23a0c36d88cd5d8968ae6454160de2878192fd1d37b5d606adca1f1b7e788b79 MesaLib-10.3.3.tar.gz 0e4eee4a2ddf86456eed2fc44da367f95471f74249636710491e85cc256c4753 MesaLib-10.3.3.tar.bz2 a83648f17d776b7cf6c813fbb15782d2644b937dc6a7c53d8c0d1b35411f4840 MesaLib-10.3.3.zip I have verified building from the .tar.bz2 file by doing: tar xjf MesaLib-10.3.3.tar.bz2 cd Mesa-10.3.3 ./configure --enable-gallium-llvm make -j6 make -j6 install I have also verified that I pushed the tag. -Emil -- Changes from 10.3.2 to 10.3.3: Anuj Phogat (2): glsl: Fix crash due to negative array index glsl: Use signed array index in update_max_array_access() Brian Paul (1): mesa: fix UNCLAMPED_FLOAT_TO_UBYTE() macro for MSVC Emil Velikov (3): docs: Add sha256 sums for the 10.3.2 release Update version to 10.3.3 Add release notes for the 10.3.3 release Ilia Mirkin (27): freedreno/ir3: fix FSLT/etc handling to return 0/-1 instead of 0/1.0 freedreno/ir3: INEG operates on src0, not src1 freedreno/ir3: add UARL support freedreno/ir3: negate result of USLT/etc freedreno/ir3: use unsigned comparison for UIF freedreno/ir3: add TXL support freedreno/ir3: fix UCMP handling freedreno/ir3: implement UMUL correctly freedreno: add default .dir-locals.el for emacs settings freedreno/ir3: make texture instruction construction more dynamic freedreno/ir3: fix TXB/TXL to actually pull the bias/lod argument freedreno/ir3: add TXQ support freedreno/ir3: add TXB2 support freedreno: dual-source render targets are not supported freedreno: instanced drawing/compute not yet supported freedreno/ir3: avoid fan-in sources referring to same instruction freedreno/ir3: add IDIV/UDIV support freedreno/ir3: add UMOD support, based on UDIV freedreno/ir3: add MOD support freedreno/ir3: add ISSG support freedreno/ir3: add UMAD support freedreno/ir3: make TXQ return integers, not floats freedreno/ir3: shadow comes before array freedreno/ir3: add texture offset support freedreno/ir3: add TXD support and expose ARB_shader_texture_lod freedreno/ir3: add TXF support freedreno: positions come out as integers, not half-integers Jan Vesely (1): configure: include llvm systemlibs when using static llvm Marek Olšák (5): r600g: fix polygon mode for points and lines and point/line fill modes radeonsi: fix polygon mode for points and lines and point/line fill modes radeonsi: fix incorrect index buffer max size for lowered 8-bit indices Revert "st/mesa: set MaxUnrollIterations = 255" r300g: remove enabled/disabled hyperz and AA compression messages Mauro Rossi (1): gallium/nouveau: fully build the driver under android Michel Dänzer (1): radeon/llvm: Dynamically allocate branch/loop stack arrays Rob Clark (62): freedreno/ir3: detect scheduler fail freedreno/ir3: add TXB freedreno/ir3: add DDX/DDY freedreno/ir3: bit of debug freedreno/ir3: fix error in bail logic freedreno/ir3: fix constlen with relative addressing freedreno/ir3: add no-copy-propagate fallback step freedreno: don't overflow cmdstream buffer so much freedreno/ir3: fix potential segfault in RA freedreno: update generated headers freedreno/a3xx: enable hw primitive-restart freedreno/a3xx: handle rendering to layer != 0 freedreno: update generated headers freedreno/a3xx: format fixes util/u_format: add _is_alpha() freedreno/a3xx: alpha render-target shenanigans freedreno/ir3: catch incorrect usage of tmp-dst freedreno/ir3: add missing put_dst freedreno: "fix" problems with excessive flushes freedreno: update generated headers freedreno/a3xx: 3d/array textures freedreno: add DRM_CONF_SHARE_FD freedreno/a3xx: more texture array fixes freedreno/a3xx: initial texture border-color freedreno: fix compiler warning freedreno: don't advertise mirror-clamp support freedreno: update generated headers freedreno: we have more than 0 viewports! freedreno: turn missing caps into compile warnings freedreno/a3xx: add LOD_BIAS freedreno/a3xx: add flat interpolation mode freedreno/a3xx: add 32bit integer vtx formats freedreno/a3xx: fix border color order freedreno: move bind_sampler_states to per-generation freedreno: add texcoord clamp support to lowering freedreno/a3xx: add support to emulate GL_CLAMP freedreno/a3xx: re-emit shade
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On Sat, Nov 8, 2014 at 4:59 AM, Siavash Eliasi wrote: > I rather to not use compiler specific hacks in mesa. If it was a personal > pet project it would make sense. We use compiler-specific things all the time. That's not going to change. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On 08/11/14 11:12, Timothy Arceri wrote: > Signed-off-by: Timothy Arceri As long as it fixes odd combinations such as this the following I'm all in favour of using such an approach. It will save us quite a few "lovely" details - split the file, configure checks etc... https://bugs.freedesktop.org/show_bug.cgi?id=71547 Just a small nit below :) Thanks Emil > --- > src/mesa/x86/x86_function_opt.h | 42 > + > 1 file changed, 42 insertions(+) > create mode 100644 src/mesa/x86/x86_function_opt.h > > Using a macro like this means we can easily enable runtime support in clang > once it also supports it. Also its less of an impact for those compiling > with the optimisations enabled. > Finally I'm assuming its also better for lto. > > diff --git a/src/mesa/x86/x86_function_opt.h b/src/mesa/x86/x86_function_opt.h > new file mode 100644 > index 000..c1ffb19 > --- /dev/null > +++ b/src/mesa/x86/x86_function_opt.h > @@ -0,0 +1,42 @@ > +/* > + * Copyright (C) Timothy Arceri > + * All Rights Reserved. > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included > + * in all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS > + * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > + * OTHER DEALINGS IN THE SOFTWARE. > + * > + * Author: > + *Timothy Arceri > + * > + */ > + > +/* > + * Helper macros to enable per function optimisations > + * > + */ > + > +#ifdef __SSSE3__ > + #define SSSE3_FUNC_OPT_START > + #define SSSE3_FUNC_OPT_END > +#else > + #if (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 3)) Normally I've preferred to have #if defined(__GNUC__) && (__GNUC__ > 4 > + #define SSSE3_FUNC_OPT_START _Pragma("GCC push_options") \ > + _Pragma("GCC target(\"ssse3\")") > + #define SSSE3_FUNC_OPT_END _Pragma("GCC pop_options") > + #endif > +#endif > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2] mesa: Permanently enable features supported by target CPU at compile time.
On 08/11/14 08:35, Siavash Eliasi wrote: > This will remove the need for unnecessary runtime checks for CPU features if > already supported by target CPU, resulting in smaller and less branchy code. > A comment I could not withheld based on your earlier post - "We require micro-benchmark for this code. It will take me hours to find why mesa is so slow now :P" Ideally mesa should have an infrastructure/farm that handles regressions - be that performance or otherwise. Pretty sure some companies have such features but those seem to be hidden behind locked doors :'( But on a more mature note, currently only cpu_has_xmm (_tnl_generate_sse_emit) and cpu_has_sse4_1(vbo_get_minmax_index) are actually useful, with the former of questionable amount :P Can you confirm that it does not cause issues with "interesting" setups such as https://bugs.freedesktop.org/show_bug.cgi?id=71547 Thanks Emil > V2: > - Removed the SSSE3 related part for the not yet merged patch. > - Avoiding redefinition of macros. > --- > src/mesa/x86/common_x86_features.h | 26 ++ > 1 file changed, 26 insertions(+) > > diff --git a/src/mesa/x86/common_x86_features.h > b/src/mesa/x86/common_x86_features.h > index 66f2cf6..65634aa 100644 > --- a/src/mesa/x86/common_x86_features.h > +++ b/src/mesa/x86/common_x86_features.h > @@ -59,13 +59,39 @@ > #define X86_CPUEXT_3DNOW_EXT (1<<30) > #define X86_CPUEXT_3DNOW (1<<31) > > +#ifdef __MMX__ > +#define cpu_has_mmx 1 > +#else > #define cpu_has_mmx (_mesa_x86_cpu_features & X86_FEATURE_MMX) > +#endif > + > #define cpu_has_mmxext (_mesa_x86_cpu_features & > X86_FEATURE_MMXEXT) > + > +#ifdef __SSE__ > +#define cpu_has_xmm 1 > +#else > #define cpu_has_xmm (_mesa_x86_cpu_features & X86_FEATURE_XMM) > +#endif > + > +#ifdef __SSE2__ > +#define cpu_has_xmm2 1 > +#else > #define cpu_has_xmm2 (_mesa_x86_cpu_features & X86_FEATURE_XMM2) > +#endif > + > +#ifdef __3dNOW__ > +#define cpu_has_3dnow1 > +#else > #define cpu_has_3dnow(_mesa_x86_cpu_features & > X86_FEATURE_3DNOW) > +#endif > + > #define cpu_has_3dnowext (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT) > + > +#ifdef __SSE4_1__ > +#define cpu_has_sse4_1 1 > +#else > #define cpu_has_sse4_1 (_mesa_x86_cpu_features & > X86_FEATURE_SSE4_1) > +#endif > > #endif > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On Sat, 2014-11-08 at 18:13 +, Emil Velikov wrote: > On 08/11/14 11:12, Timothy Arceri wrote: > > Signed-off-by: Timothy Arceri > As long as it fixes odd combinations such as this the following I'm all > in favour of using such an approach. It will save us quite a few > "lovely" details - split the file, configure checks etc... > > https://bugs.freedesktop.org/show_bug.cgi?id=71547 Hmmm, what a pain. I'm not sure what "GCC target" will do in this case, will need to check. It will be a shame if we can't enable these optimisations just because people wish to build in this way. > > Just a small nit below :) > > Thanks > Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Implement GL_ARB_draw_indirect
Requires evergreen/cayman, and updated radeon kernel module. Signed-off-by: Glenn Kennard --- See also kernel side patch sent to dri-de...@lists.freedesktop.org docs/GL3.txt | 4 +- docs/relnotes/10.4.html | 1 + src/gallium/drivers/r600/evergreend.h| 7 ++- src/gallium/drivers/r600/r600_pipe.c | 6 ++- src/gallium/drivers/r600/r600_state_common.c | 80 ++-- 5 files changed, 77 insertions(+), 21 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 2854431..06c52f9 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft GL 4.0, GLSL 4.00: GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) - GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_gpu_shader5 DONE (i965, nvc0) - 'precise' qualifierDONE - Dynamically uniform sampler array indices DONE (r600) @@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30: GL_ARB_framebuffer_no_attachmentsnot started GL_ARB_internalformat_query2 not started GL_ARB_invalidate_subdataDONE (all drivers) - GL_ARB_multi_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_program_interface_query not started GL_ARB_robust_buffer_access_behavior not started GL_ARB_shader_image_size not started diff --git a/docs/relnotes/10.4.html b/docs/relnotes/10.4.html index d0fbd3b..9c2a491 100644 --- a/docs/relnotes/10.4.html +++ b/docs/relnotes/10.4.html @@ -49,6 +49,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_texture_view on nv50, nvc0 GL_ARB_clip_control on llvmpipe, softpipe, r300, r600, radeonsi GL_KHR_context_flush_control on all drivers +GL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600 diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 4989996..b8880c8 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -64,6 +64,8 @@ #define R600_TEXEL_PITCH_ALIGNMENT_MASK0x7 #define PKT3_NOP 0x10 +#define PKT3_SET_BASE 0x11 +#define PKT3_INDEX_BUFFER_SIZE 0x13 #define PKT3_DEALLOC_STATE 0x14 #define PKT3_DISPATCH_DIRECT 0x15 #define PKT3_DISPATCH_INDIRECT 0x16 @@ -72,12 +74,15 @@ #define PKT3_REG_RMW 0x21 #define PKT3_COND_EXEC 0x22 #define PKT3_PRED_EXEC 0x23 -#define PKT3_START_3D_CMDBUF 0x24 +#define PKT3_DRAW_INDIRECT 0x24 +#define PKT3_DRAW_INDEX_INDIRECT 0x25 +#define PKT3_INDEX_BASE0x26 #define PKT3_DRAW_INDEX_2 0x27 #define PKT3_CONTEXT_CONTROL 0x28 #define PKT3_DRAW_INDEX_IMMD_BE0x29 #define PKT3_INDEX_TYPE0x2A #define PKT3_DRAW_INDEX0x2B +#define PKT3_DRAW_INDIRECT_MULTI 0x2C #define PKT3_DRAW_INDEX_AUTO 0x2D #define PKT3_DRAW_INDEX_IMMD 0x2E #define PKT3_NUM_INSTANCES 0x2F diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 0b571e4..829deaf 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -313,6 +313,11 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) return family >= CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family >= CHIP_CEDAR ? 4 : 0; + case PIPE_CAP_DRAW_INDIRECT: + /* needs kernel command checking support to work */ + if (family >= CHIP_CEDAR && rscreen->b.info.drm_minor >= 41) + return 1; + return 0; /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: @@ -322,7 +327,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: - case PIPE_CAP_DRAW_INDIRECT: case PIPE_CAP_CONDITIONAL_RENDER_INVERTED: cas
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On Sat, 2014-11-08 at 16:29 +0330, Siavash Eliasi wrote: > I rather to not use compiler specific hacks in mesa. If it was a > personal pet project it would make sense. > > Best regards, > Siavash Eliasi. Having to work around compiler differences is a real world problem. As has been pointed out before msvc handles function specific optimisations using intrinsics automatically. GCC and Clang don't so to avoid the multiple issues cause by having to move things to a different file we need to work around it. Thankfully macros allow the impact to be minimised. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gbm: dlopen libglapi so gbm_create_device works
On 06/11/14 21:29, Frank Henigman wrote: > From: Frank Henigman > > Dri driver libs are not linked to pull in libglapi so gbm_create_device() > fails when it tries to dlopen them (unless the application is linked > with something that does pull in libglapi, like libGL). > Until dri drivers can be fixed properly, dlopen libglapi before trying > to dlopen them. > https://bugs.freedesktop.org/show_bug.cgi?id=57702 > Hi Frank, I think I can understand the frustration that this has caused you, so unless there are any objections I will gladly pick it up for the 10.4 (and if there are no side effects for the stable 10.3 branch). Just a couple of nits, which I'm planning to make prior to pushing this (a week from now, just before the branchpoint) * the bugzilla report mentiones libglapi, but in a different light so I'll rephase the commit msg a bit. * we might as well print out an error message and bail out when we dlopen fails. Thanks for bringing this up :) -Emil > Signed-off-by: Frank Henigman > --- > src/gbm/backends/dri/gbm_dri.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c > index f637e32..6ea2294 100644 > --- a/src/gbm/backends/dri/gbm_dri.c > +++ b/src/gbm/backends/dri/gbm_dri.c > @@ -311,6 +311,11 @@ dri_open_driver(struct gbm_dri_device *dri) > if (search_paths == NULL) >search_paths = DEFAULT_DRIVER_DIR; > > + /* Temporarily work around dri driver libs that need symbols in libglapi > +* but don't automatically link it in. > +*/ > + dlopen("libglapi.so.0", RTLD_LAZY | RTLD_GLOBAL); > + > dri->driver = NULL; > end = search_paths + strlen(search_paths); > for (p = search_paths; p < end && dri->driver == NULL; p = next + 1) { > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2] mesa: Permanently enable features supported by target CPU at compile time.
On Sat, 2014-11-08 at 18:25 +, Emil Velikov wrote: > On 08/11/14 08:35, Siavash Eliasi wrote: > > This will remove the need for unnecessary runtime checks for CPU features if > > already supported by target CPU, resulting in smaller and less branchy code. > > > A comment I could not withheld based on your earlier post - "We require > micro-benchmark for this code. It will take me hours to find why mesa is > so slow now :P" Hehe, you are technically correct, the best kind of correct. > > Ideally mesa should have an infrastructure/farm that handles regressions > - be that performance or otherwise. Pretty sure some companies have such > features but those seem to be hidden behind locked doors :'( I considered trying to work on a solution for this where anyone could volunteer their machine to run such tests and send the results back to a central server. But in the end its a pretty big project to get something like this working correctly, and making it easy to setup thus participate in. I guess you could leverage something like the oibaf ppa to start with rather than pulling down each commit and building. Anyway a much bigger project than what I have time for at the moment. > > But on a more mature note, currently only cpu_has_xmm > (_tnl_generate_sse_emit) and cpu_has_sse4_1(vbo_get_minmax_index) are > actually useful, with the former of questionable amount :P > > Can you confirm that it does not cause issues with "interesting" setups > such as https://bugs.freedesktop.org/show_bug.cgi?id=71547 > I think this patch should be find in this case as the solution there was to wrap the code with #ifdef __SSE4_1__ which is what makes this patch work. > > Thanks > Emil > > > V2: > > - Removed the SSSE3 related part for the not yet merged patch. > > - Avoiding redefinition of macros. > > --- > > src/mesa/x86/common_x86_features.h | 26 ++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/src/mesa/x86/common_x86_features.h > > b/src/mesa/x86/common_x86_features.h > > index 66f2cf6..65634aa 100644 > > --- a/src/mesa/x86/common_x86_features.h > > +++ b/src/mesa/x86/common_x86_features.h > > @@ -59,13 +59,39 @@ > > #define X86_CPUEXT_3DNOW_EXT (1<<30) > > #define X86_CPUEXT_3DNOW (1<<31) > > > > +#ifdef __MMX__ > > +#define cpu_has_mmx1 > > +#else > > #define cpu_has_mmx(_mesa_x86_cpu_features & > > X86_FEATURE_MMX) > > +#endif > > + > > #define cpu_has_mmxext (_mesa_x86_cpu_features & > > X86_FEATURE_MMXEXT) > > + > > +#ifdef __SSE__ > > +#define cpu_has_xmm1 > > +#else > > #define cpu_has_xmm(_mesa_x86_cpu_features & > > X86_FEATURE_XMM) > > +#endif > > + > > +#ifdef __SSE2__ > > +#define cpu_has_xmm2 1 > > +#else > > #define cpu_has_xmm2 (_mesa_x86_cpu_features & > > X86_FEATURE_XMM2) > > +#endif > > + > > +#ifdef __3dNOW__ > > +#define cpu_has_3dnow 1 > > +#else > > #define cpu_has_3dnow (_mesa_x86_cpu_features & > > X86_FEATURE_3DNOW) > > +#endif > > + > > #define cpu_has_3dnowext (_mesa_x86_cpu_features & X86_FEATURE_3DNOWEXT) > > + > > +#ifdef __SSE4_1__ > > +#define cpu_has_sse4_1 1 > > +#else > > #define cpu_has_sse4_1 (_mesa_x86_cpu_features & > > X86_FEATURE_SSE4_1) > > +#endif > > > > #endif > > > > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
I know that's a time saver for developer (gcc function multi versioning), however I still do prefer the approach (my own ^^ ) which works on all setups regardless of hardware and compiler (well, any sane compiler ICC, GCC, Clang,...). Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2] mesa: Permanently enable features supported by target CPU at compile time.
On 11/08/2014 09:55 PM, Emil Velikov wrote: A comment I could not withheld based on your earlier post - "We require micro-benchmark for this code. It will take me hours to find why mesa is so slow now :P" Which brings the question why didn't you post to that thread/topic in first place instead :P Ideally mesa should have an infrastructure/farm that handles regressions - be that performance or otherwise. Pretty sure some companies have such features but those seem to be hidden behind locked doors :'( Yes, that's unfortunate. But atleast we have Phoronix :) Can you confirm that it does not cause issues with "interesting" setups such as https://bugs.freedesktop.org/show_bug.cgi?id=71547 Challenge accepted! What my patch is doing is to check for provided compile flags (-msse, ...) on compile time (__SSE__, ...) and set "cpu_has_sse" macro to "1" which allows any sane compiler to turn this pieces of code: #ifdef USE_SSE if (cpu_has_sse) { /* SSE code path */ } else #endif { /* C fallback */ } into this: /* SSE code path */ by using compile time information by target CPU. Best regards, Siavash Eliasi. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] auxiliary/vl: split the vl sources list into VL_SOURCES
With follow up commit we'll split vl static lib from the auxiliary one, and choose the appropriate vl (galliumvl or galliumvl_stub) for the respective targets to link against. Cc: Christian König Signed-off-by: Emil Velikov --- src/gallium/auxiliary/Android.mk | 4 +++- src/gallium/auxiliary/Makefile.am | 1 + src/gallium/auxiliary/Makefile.sources | 41 ++ src/gallium/auxiliary/SConscript | 1 + 4 files changed, 37 insertions(+), 10 deletions(-) diff --git a/src/gallium/auxiliary/Android.mk b/src/gallium/auxiliary/Android.mk index 8046943..2e7d7a8 100644 --- a/src/gallium/auxiliary/Android.mk +++ b/src/gallium/auxiliary/Android.mk @@ -28,7 +28,9 @@ include $(LOCAL_PATH)/Makefile.sources include $(CLEAR_VARS) -LOCAL_SRC_FILES := $(C_SOURCES) +LOCAL_SRC_FILES := \ + $(C_SOURCES) \ + $(VL_SOURCES) LOCAL_C_INCLUDES := \ $(GALLIUM_TOP)/auxiliary/util \ diff --git a/src/gallium/auxiliary/Makefile.am b/src/gallium/auxiliary/Makefile.am index 1e268b2..1e18e6e 100644 --- a/src/gallium/auxiliary/Makefile.am +++ b/src/gallium/auxiliary/Makefile.am @@ -18,6 +18,7 @@ AM_CXXFLAGS = $(VISIBILITY_CXXFLAGS) libgallium_la_SOURCES = \ $(C_SOURCES) \ + $(VL_SOURCES) \ $(GENERATED_SOURCES) if HAVE_MESA_LLVM diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index 9625ee5..66edb4d 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -144,20 +144,43 @@ C_SOURCES := \ util/u_transfer.c \ util/u_resource.c \ util/u_upload_mgr.c \ - util/u_vbuf.c \ - vl/vl_csc.c \ + util/u_vbuf.c + +VL_SOURCES := \ vl/vl_compositor.c \ + vl/vl_compositor.h \ + vl/vl_csc.c \ + vl/vl_csc.h \ + vl/vl_decoder.c \ + vl/vl_decoder.h \ + vl/vl_defines.h \ + vl/vl_deint_filter.c \ + vl/vl_deint_filter.h \ + vl/vl_idct.c \ + vl/vl_idct.h \ vl/vl_matrix_filter.c \ + vl/vl_matrix_filter.h \ + vl/vl_mc.c \ + vl/vl_mc.h \ vl/vl_median_filter.c \ - vl/vl_decoder.c \ - vl/vl_mpeg12_decoder.c \ + vl/vl_median_filter.h \ vl/vl_mpeg12_bitstream.c \ + vl/vl_mpeg12_bitstream.h \ + vl/vl_mpeg12_decoder.c \ + vl/vl_mpeg12_decoder.h \ + vl/vl_rbsp.h \ + vl/vl_types.h \ + vl/vl_vertex_buffers.c \ + vl/vl_vertex_buffers.h \ + vl/vl_video_buffer.c \ + vl/vl_video_buffer.h \ + vl/vl_vlc.h \ vl/vl_zscan.c \ -vl/vl_idct.c \ - vl/vl_mc.c \ -vl/vl_vertex_buffers.c \ -vl/vl_video_buffer.c \ - vl/vl_deint_filter.c + vl/vl_zscan.h + +# XXX: Add those to VL_SOURCES once we've split it out of libgallium +# vl/vl_winsys_dri.c \ +# vl/vl_winsys.h \ VL_STUB_SOURCES := \ vl/vl_stubs.c diff --git a/src/gallium/auxiliary/SConscript b/src/gallium/auxiliary/SConscript index 94041d2..81c4f4c 100644 --- a/src/gallium/auxiliary/SConscript +++ b/src/gallium/auxiliary/SConscript @@ -36,6 +36,7 @@ env.Depends('util/u_format_table.c', [ source = env.ParseSourceList('Makefile.sources', [ 'C_SOURCES', +'VL_SOURCES', 'GENERATED_SOURCES' ]) -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/5] Use vl_stub for the dri/egl/... gallium targets
Hello all, Here is a reworked version of a patch I've send a while back - it creates a stub for the vl functions used directly by the gallium drivers. At link time we use it for non-vl targets, while for vdpau and friends we use a galliumvl static lib, which is split out of auxiliary. Resulting to: - Four automake warnings less - Small deduplication of in the target Makefiles - Some nice size savings of the resulting modules textdata bss dec hex filename 5850573 187549 1977928 8016050 7a50b2 before/nouveau_dri.so 5508486 187100 391240 6086826 5ce0aa after/nouveau_dri.so As usual the branch can be found in my github repo at github.com/evelikov/mesa/tree/stub-vl Any comments, suggestions and testing would be appeciated. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] auxiliary/vl: add galliumvl_stub.la
Will be used by the non-VL targets, to stub out the functions called by the drivers. The entry point to those are within the VL state-trackers, yet the compiler cannot determine that at link time. Thus we'll need to stub them out to prevent unresolved symbols in the dri, egl, gbm and pipe-loader targets. Cc: Christian König Signed-off-by: Emil Velikov --- src/gallium/auxiliary/Makefile.am | 5 ++ src/gallium/auxiliary/Makefile.sources | 3 + src/gallium/auxiliary/vl/vl_stubs.c| 147 + 3 files changed, 155 insertions(+) create mode 100644 src/gallium/auxiliary/vl/vl_stubs.c diff --git a/src/gallium/auxiliary/Makefile.am b/src/gallium/auxiliary/Makefile.am index 4d8ba89..1e268b2 100644 --- a/src/gallium/auxiliary/Makefile.am +++ b/src/gallium/auxiliary/Makefile.am @@ -46,3 +46,8 @@ indices/u_unfilled_gen.c: $(srcdir)/indices/u_unfilled_gen.py util/u_format_table.c: $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format_pack.py $(srcdir)/util/u_format_parse.py $(srcdir)/util/u_format.csv $(AM_V_at)$(MKDIR_P) util $(AM_V_GEN) $(PYTHON2) $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format.csv > $@ + + +noinst_LTLIBRARIES += libgalliumvl_stub.la +libgalliumvl_stub_la_SOURCES = \ + $(VL_STUB_SOURCES) diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index f6621ef..9625ee5 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -159,6 +159,9 @@ C_SOURCES := \ vl/vl_video_buffer.c \ vl/vl_deint_filter.c +VL_STUB_SOURCES := \ + vl/vl_stubs.c + GENERATED_SOURCES := \ indices/u_indices_gen.c \ indices/u_unfilled_gen.c \ diff --git a/src/gallium/auxiliary/vl/vl_stubs.c b/src/gallium/auxiliary/vl/vl_stubs.c new file mode 100644 index 000..d690eca --- /dev/null +++ b/src/gallium/auxiliary/vl/vl_stubs.c @@ -0,0 +1,147 @@ +#include + +#include "vl_decoder.h" +#include "vl_mpeg12_bitstream.h" +#include "vl_mpeg12_decoder.h" +#include "vl_video_buffer.h" +#include "vl_zscan.h" + + +/* + * vl_decoder stubs + */ +bool +vl_profile_supported(struct pipe_screen *screen, + enum pipe_video_profile profile, + enum pipe_video_entrypoint entrypoint) +{ + assert(0); + return false; +} + +int +vl_level_supported(struct pipe_screen *screen, + enum pipe_video_profile profile) +{ + assert(0); + return 0; +} + +struct pipe_video_codec * +vl_create_decoder(struct pipe_context *pipe, + const struct pipe_video_codec *templat) +{ + assert(0); + return NULL; +} + + +/* + * vl_video_buffer stubs + */ +const enum pipe_format * +vl_video_buffer_formats(struct pipe_screen *screen, enum pipe_format format) +{ + assert(0); + return NULL; +} + +boolean +vl_video_buffer_is_format_supported(struct pipe_screen *screen, +enum pipe_format format, +enum pipe_video_profile profile, +enum pipe_video_entrypoint entrypoint) +{ + assert(0); + return false; +} + +unsigned +vl_video_buffer_max_size(struct pipe_screen *screen) +{ + assert(0); + return 0; +} + +void +vl_video_buffer_set_associated_data(struct pipe_video_buffer *vbuf, +struct pipe_video_codec *vcodec, +void *associated_data, +void (*destroy_associated_data)(void *)) +{ + assert(0); +} + +void * +vl_video_buffer_get_associated_data(struct pipe_video_buffer *vbuf, +struct pipe_video_codec *vcodec) +{ + assert(0); + return NULL; +} + +void +vl_video_buffer_template(struct pipe_resource *templ, + const struct pipe_video_buffer *tmpl, + enum pipe_format resource_format, + unsigned depth, unsigned array_size, + unsigned usage, unsigned plane) +{ + assert(0); +} + +struct pipe_video_buffer * +vl_video_buffer_create(struct pipe_context *pipe, + const struct pipe_video_buffer *tmpl) +{ + assert(0); + return NULL; +} + +struct pipe_video_buffer * +vl_video_buffer_create_ex2(struct pipe_context *pipe, + const struct pipe_video_buffer *tmpl, + struct pipe_resource *resources[VL_NUM_COMPONENTS]) +{ + assert(0); + return NULL; +} + + +/* + * vl_mpeg12_bitstream stubs + */ +void +vl_mpg12_bs_init(struct vl_mpg12_bs *bs, struct pipe_video_codec *decoder) +{ + assert(0); +} + +void +vl_mpg12_bs_decode(struct vl_mpg12_bs *bs, + struct pipe_video_buffer *target, + struct pipe_mpeg12_picture_desc *picture, + unsigned num_buffers, + const void * const *buffers, + const unsigned *sizes) +{
[Mesa-dev] [PATCH 5/5] auxiliary/vl: rework the build of the VL code
Rather than shoving all the VL code for non-VL targets, increasing their size, just split it out and use it when needed. This gives us the side effect of building vl_winsys_dri.c once, dropping a few automake warnings, and reducing the size of the dri modules as below textdata bss dec hex filename 5850573 187549 1977928 8016050 7a50b2 before/nouveau_dri.so 5508486 187100 391240 6086826 5ce0aa after/nouveau_dri.so The above data is for a nouveau + swrast + kms_swrast 'megadriver'. Cc: Christian König Signed-off-by: Emil Velikov --- src/gallium/auxiliary/Android.mk| 2 +- src/gallium/auxiliary/Makefile.am | 22 ++ src/gallium/auxiliary/Makefile.sources | 6 ++ src/gallium/auxiliary/SConscript| 2 +- src/gallium/targets/dri/Makefile.am | 1 + src/gallium/targets/egl-static/Makefile.am | 1 + src/gallium/targets/gbm/Makefile.am | 1 + src/gallium/targets/omx/Makefile.am | 11 +++ src/gallium/targets/pipe-loader/Makefile.am | 14 -- src/gallium/targets/va/Makefile.am | 11 +++ src/gallium/targets/vdpau/Makefile.am | 11 +++ src/gallium/targets/xa/Makefile.am | 1 + src/gallium/targets/xvmc/Makefile.am| 11 +++ 13 files changed, 54 insertions(+), 40 deletions(-) diff --git a/src/gallium/auxiliary/Android.mk b/src/gallium/auxiliary/Android.mk index 2e7d7a8..0bc1831 100644 --- a/src/gallium/auxiliary/Android.mk +++ b/src/gallium/auxiliary/Android.mk @@ -30,7 +30,7 @@ include $(CLEAR_VARS) LOCAL_SRC_FILES := \ $(C_SOURCES) \ - $(VL_SOURCES) + $(VL_STUB_SOURCES) LOCAL_C_INCLUDES := \ $(GALLIUM_TOP)/auxiliary/util \ diff --git a/src/gallium/auxiliary/Makefile.am b/src/gallium/auxiliary/Makefile.am index 1e18e6e..69ae31f 100644 --- a/src/gallium/auxiliary/Makefile.am +++ b/src/gallium/auxiliary/Makefile.am @@ -52,3 +52,25 @@ util/u_format_table.c: $(srcdir)/util/u_format_table.py $(srcdir)/util/u_format_ noinst_LTLIBRARIES += libgalliumvl_stub.la libgalliumvl_stub_la_SOURCES = \ $(VL_STUB_SOURCES) + +if NEED_GALLIUM_VL + +noinst_LTLIBRARIES += libgalliumvl.la + +libgalliumvl_la_CFLAGS = \ + $(AM_CFLAGS) \ + $(VL_CFLAGS) \ + $(LIBDRM_CFLAGS) \ + $(GALLIUM_PIPE_LOADER_DEFINES) \ + -DPIPE_SEARCH_DIR=\"$(libdir)/gallium-pipe\" + +if HAVE_GALLIUM_STATIC_TARGETS +libgalliumvl_la_CFLAGS += \ + -DGALLIUM_STATIC_TARGETS=1 + +endif # HAVE_GALLIUM_STATIC_TARGETS + +libgalliumvl_la_SOURCES = \ + $(VL_SOURCES) + +endif diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index 66edb4d..f3b95b9 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -175,13 +175,11 @@ VL_SOURCES := \ vl/vl_video_buffer.c \ vl/vl_video_buffer.h \ vl/vl_vlc.h \ + vl/vl_winsys.h \ + vl/vl_winsys_dri.c \ vl/vl_zscan.c \ vl/vl_zscan.h -# XXX: Add those to VL_SOURCES once we've split it out of libgallium -# vl/vl_winsys_dri.c \ -# vl/vl_winsys.h \ - VL_STUB_SOURCES := \ vl/vl_stubs.c diff --git a/src/gallium/auxiliary/SConscript b/src/gallium/auxiliary/SConscript index 81c4f4c..4984434 100644 --- a/src/gallium/auxiliary/SConscript +++ b/src/gallium/auxiliary/SConscript @@ -36,7 +36,7 @@ env.Depends('util/u_format_table.c', [ source = env.ParseSourceList('Makefile.sources', [ 'C_SOURCES', -'VL_SOURCES', +'VL_STUB_SOURCES', 'GENERATED_SOURCES' ]) diff --git a/src/gallium/targets/dri/Makefile.am b/src/gallium/targets/dri/Makefile.am index 1094ffd..898ab46 100644 --- a/src/gallium/targets/dri/Makefile.am +++ b/src/gallium/targets/dri/Makefile.am @@ -43,6 +43,7 @@ gallium_dri_la_LIBADD = \ $(top_builddir)/src/mesa/drivers/dri/common/libdricommon.la \ $(top_builddir)/src/mesa/drivers/dri/common/libmegadriver_stub.la \ $(top_builddir)/src/gallium/state_trackers/dri/libdri.la \ + $(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/gallium/drivers/galahad/libgalahad.la \ $(top_builddir)/src/gallium/drivers/noop/libnoop.la \ diff --git a/src/gallium/targets/egl-static/Makefile.am b/src/gallium/targets/egl-static/Makefile.am index 3f0e650..b188f82 100644 --- a/src/gallium/targets/egl-static/Makefile.am +++ b/src/gallium/targets/egl-static/Makefile.am @@ -63,6 +63,7 @@ egl_gallium_la_SOURCES = \ egl_gallium_la_LIBADD = \ $(top_builddir)/src/loader/libloader.la \ + $(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \ $(top_builddir)/src/gallium/auxiliary/libgallium.la \ $(top_builddir)/src/gallium/drivers/identity/libidentity.la \ $(top_builddir)/src/gallium/drivers/trace/libtrace.la \ diff --git a/src/gallium/targets
[Mesa-dev] [PATCH 1/5] configure: check the package version when auto-detecting the VL targets
Or we might end up where automatically enable the build, only to error out a couple of lines after that. Cc: Christian König Signed-off-by: Emil Velikov --- configure.ac | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/configure.ac b/configure.ac index fc7d372..d073bab 100644 --- a/configure.ac +++ b/configure.ac @@ -39,6 +39,7 @@ PRESENTPROTO_REQUIRED=1.0 LIBUDEV_REQUIRED=151 GLPROTO_REQUIRED=1.4.14 LIBOMXIL_BELLAGIO_REQUIRED=0.0 +LIBVA_REQUIRED=0.35.0 VDPAU_REQUIRED=0.4.1 WAYLAND_REQUIRED=1.2.0 XCB_REQUIRED=1.9.3 @@ -1402,19 +1403,19 @@ dnl Gallium G3DVL configuration dnl if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then if test "x$enable_xvmc" = xauto; then - PKG_CHECK_EXISTS([xvmc], [enable_xvmc=yes], [enable_xvmc=no]) + PKG_CHECK_EXISTS([xvmc >= $XVMC_REQUIRED], [enable_xvmc=yes], [enable_xvmc=no]) fi if test "x$enable_vdpau" = xauto; then - PKG_CHECK_EXISTS([vdpau], [enable_vdpau=yes], [enable_vdpau=no]) + PKG_CHECK_EXISTS([vdpau >= $VDPAU_REQUIRED], [enable_vdpau=yes], [enable_vdpau=no]) fi if test "x$enable_omx" = xauto; then - PKG_CHECK_EXISTS([libomxil-bellagio], [enable_omx=yes], [enable_omx=no]) + PKG_CHECK_EXISTS([libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED], [enable_omx=yes], [enable_omx=no]) fi if test "x$enable_va" = xauto; then -PKG_CHECK_EXISTS([libva], [enable_va=yes], [enable_va=no]) +PKG_CHECK_EXISTS([libva >= $LIBVA_REQUIRED], [enable_va=yes], [enable_va=no]) fi fi @@ -1438,7 +1439,7 @@ fi AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes) if test "x$enable_va" = xyes; then -PKG_CHECK_MODULES([VA], [libva >= 0.35.0 x11-xcb xcb-dri2 >= $XCBDRI2_REQUIRED], +PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED x11-xcb xcb-dri2 >= $XCBDRI2_REQUIRED], [VA_LIBS="`$PKG_CONFIG --libs x11-xcb xcb-dri2`"]) enable_gallium_loader=$enable_shared_pipe_drivers fi -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] automake: rework VL dependency tracking
Set a single VL_{CFLAG,LIBS} for xcb and friends, and let each target check for it's relevant library alone. Required as with follow up commits we'll build aux/vl into a separate module, which needs VL_CFLAGS Cleanup add a couple of explicit LIBDRM_LIBS linking, as aux/vl itself requires libdrm, despite that LIBDRM_{RADEON,NOUVEAU...} may provide it as well. Cc: Christian König Signed-off-by: Emil Velikov --- configure.ac | 19 +-- src/gallium/state_trackers/omx/Makefile.am | 1 + src/gallium/state_trackers/va/Makefile.am| 1 + src/gallium/state_trackers/vdpau/Makefile.am | 1 + src/gallium/state_trackers/xvmc/Makefile.am | 2 ++ src/gallium/targets/omx/Makefile.am | 3 +++ src/gallium/targets/va/Makefile.am | 5 - src/gallium/targets/vdpau/Makefile.am| 4 +++- src/gallium/targets/xvmc/Makefile.am | 2 ++ 9 files changed, 30 insertions(+), 8 deletions(-) diff --git a/configure.ac b/configure.ac index d073bab..c7d74a0 100644 --- a/configure.ac +++ b/configure.ac @@ -1419,28 +1419,35 @@ if test -n "$with_gallium_drivers" -a "x$with_gallium_drivers" != xswrast; then fi fi +if test "x$enable_xvmc" = xyes -o \ +"x$enable_vdpau" = xyes -o \ +"x$enable_omx" = xyes -o \ +"x$enable_va" = xyes; then +PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED]) +need_gallium_vl=yes +fi +AM_CONDITIONAL(NEED_GALLIUM_VL, test "x$need_gallium_vl" = xyes) + if test "x$enable_xvmc" = xyes; then -PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED]) +PKG_CHECK_MODULES([XVMC], [xvmc >= $XVMC_REQUIRED]) enable_gallium_loader=$enable_shared_pipe_drivers fi AM_CONDITIONAL(HAVE_ST_XVMC, test "x$enable_xvmc" = xyes) if test "x$enable_vdpau" = xyes; then -PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED], - [VDPAU_LIBS="`$PKG_CONFIG --libs x11-xcb xcb xcb-dri2`"]) +PKG_CHECK_MODULES([VDPAU], [vdpau >= $VDPAU_REQUIRED]) enable_gallium_loader=$enable_shared_pipe_drivers fi AM_CONDITIONAL(HAVE_ST_VDPAU, test "x$enable_vdpau" = xyes) if test "x$enable_omx" = xyes; then -PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED]) +PKG_CHECK_MODULES([OMX], [libomxil-bellagio >= $LIBOMXIL_BELLAGIO_REQUIRED]) enable_gallium_loader=$enable_shared_pipe_drivers fi AM_CONDITIONAL(HAVE_ST_OMX, test "x$enable_omx" = xyes) if test "x$enable_va" = xyes; then -PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED x11-xcb xcb-dri2 >= $XCBDRI2_REQUIRED], - [VA_LIBS="`$PKG_CONFIG --libs x11-xcb xcb-dri2`"]) +PKG_CHECK_MODULES([VA], [libva >= $LIBVA_REQUIRED]) enable_gallium_loader=$enable_shared_pipe_drivers fi AM_CONDITIONAL(HAVE_ST_VA, test "x$enable_va" = xyes) diff --git a/src/gallium/state_trackers/omx/Makefile.am b/src/gallium/state_trackers/omx/Makefile.am index 68eed02..d68746c 100644 --- a/src/gallium/state_trackers/omx/Makefile.am +++ b/src/gallium/state_trackers/omx/Makefile.am @@ -26,6 +26,7 @@ include $(top_srcdir)/src/gallium/Automake.inc AM_CFLAGS = \ $(GALLIUM_CFLAGS) \ $(VISIBILITY_CFLAGS) \ + $(VL_CFLAGS) \ $(OMX_CFLAGS) noinst_LTLIBRARIES = libomxtracker.la diff --git a/src/gallium/state_trackers/va/Makefile.am b/src/gallium/state_trackers/va/Makefile.am index ec64c3f..2a93a90 100644 --- a/src/gallium/state_trackers/va/Makefile.am +++ b/src/gallium/state_trackers/va/Makefile.am @@ -26,6 +26,7 @@ include $(top_srcdir)/src/gallium/Automake.inc AM_CFLAGS = \ $(GALLIUM_CFLAGS) \ $(VISIBILITY_CFLAGS) \ + $(VL_CFLAGS) \ $(VA_CFLAGS) \ -DVA_DRIVER_INIT_FUNC="__vaDriverInit_$(VA_MAJOR)_$(VA_MINOR)" diff --git a/src/gallium/state_trackers/vdpau/Makefile.am b/src/gallium/state_trackers/vdpau/Makefile.am index a74b5bf..d7fd68b 100644 --- a/src/gallium/state_trackers/vdpau/Makefile.am +++ b/src/gallium/state_trackers/vdpau/Makefile.am @@ -29,6 +29,7 @@ VDPAU_MINOR = 0 AM_CFLAGS = \ $(GALLIUM_CFLAGS) \ $(VISIBILITY_CFLAGS) \ + $(VL_CFLAGS) \ $(VDPAU_CFLAGS) AM_CPPFLAGS = \ -I$(top_srcdir)/include \ diff --git a/src/gallium/state_trackers/xvmc/Makefile.am b/src/gallium/state_trackers/xvmc/Makefile.am index abaa88e..9f51280 100644 --- a/src/gallium/state_trackers/xvmc/Makefile.am +++ b/src/gallium/state_trackers/xvmc/Makefile.am @@ -26,6 +26,8 @@ include $(top_srcdir)/src/gallium/Automake.inc AM_CFLAGS = \ $(GALLIUM_CFLAGS) \ + $(VISIBILITY_CFLAGS) \ + $(VL_CFLAGS) \ $(XVMC_CFLAGS) noinst_LTLIBRARIES = libxvmctracker.la diff --git a/src/gallium/targets/omx/Makefile.am b/src/gallium/targets/omx/Makefile.am index 4be1063..5f4106c 100644 --- a/src/gallium/targets/omx/Makefile.am +++ b/src/ga
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On Sun, 2014-11-09 at 08:59 +1100, Timothy Arceri wrote: > On Sat, 2014-11-08 at 18:13 +, Emil Velikov wrote: > > On 08/11/14 11:12, Timothy Arceri wrote: > > > Signed-off-by: Timothy Arceri > > As long as it fixes odd combinations such as this the following I'm all > > in favour of using such an approach. It will save us quite a few > > "lovely" details - split the file, configure checks etc... > > > > https://bugs.freedesktop.org/show_bug.cgi?id=71547 > > > Hmmm, what a pain. I'm not sure what "GCC target" will do in this case, > will need to check. ok, so for the intel driver it doesn't build with this type of setup anyway as intel_mipmap_tree.c still uses #if defined(USE_SSE41) you end up with undefined symbol: _mesa_streaming_load_memcpy at runtime. Once I removed the calls to _mesa_streaming_load_memcpy is seems GCC target overrides the no-ssse3 flag so it seems it will be fine as long as a runtime check is in place. > > It will be a shame if we can't enable these optimisations just because > people wish to build in this way. > > > > > Just a small nit below :) > > > > Thanks > > Emil > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] i965/fs: Dead code eliminate instructions writing the flag.
On Wednesday, October 29, 2014 02:10:12 PM Matt Turner wrote: > Most prominently helps Natural Selection 2, which has a surprising > number shaders that do very complicated things before drawing black. > > instructions in affected programs: 23824 -> 19570 (-17.86%) > --- > .../dri/i965/brw_fs_dead_code_eliminate.cpp| 23 > +++--- > 1 file changed, 20 insertions(+), 3 deletions(-) (requoting the diff to add more context...) > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp index > 9cf8d89..c5f5ede 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > @@ -21,93 +21,111 @@ > > * IN THE SOFTWARE. > */ > > #include "brw_fs.h" > #include "brw_fs_live_variables.h" > #include "brw_cfg.h" > > /** @file brw_fs_dead_code_eliminate.cpp > * > * Dataflow-aware dead code elimination. > * > * Walks the instruction list from the bottom, removing instructions that > * have results that both aren't used in later blocks and haven't been read > * yet in the tail end of this block. > */ > > bool > fs_visitor::dead_code_eliminate() > { > > bool progress = false; > > calculate_live_intervals(); > > int num_vars = live_intervals->num_vars; > BITSET_WORD *live = ralloc_array(NULL, BITSET_WORD, > BITSET_WORDS(num_vars)); > + BITSET_WORD *flag_live = ralloc_array(NULL, BITSET_WORD, 1); > > foreach_block (block, cfg) { >memcpy(live, live_intervals->block_data[block->num].liveout, > sizeof(BITSET_WORD) * BITSET_WORDS(num_vars)); > + memcpy(flag_live, live_intervals->block_data[block->num].flag_liveout, > + sizeof(BITSET_WORD)); > >foreach_inst_in_block_reverse(fs_inst, inst, block) { > - if (inst->dst.file == GRF && > - !inst->has_side_effects() && > - !inst->writes_flag()) { > + if (inst->dst.file == GRF && !inst->has_side_effects()) { > bool result_live = false; This seems wrong to me. Instructions handled here must have a destination, but now can also write the flag...yet... > if (inst->regs_written == 1) { > int var = live_intervals->var_from_reg(&inst->dst); > result_live = BITSET_TEST(live, var); > } else { > int var = live_intervals->var_from_reg(&inst->dst); > for (int i = 0; i < inst->regs_written; i++) { >result_live = result_live || BITSET_TEST(live, var + i); > } > } > > if (!result_live) { > progress = true; > > if (inst->writes_accumulator) { >inst->dst = fs_reg(retype(brw_null_reg(), >inst->dst.type)); > } else { >inst->opcode = BRW_OPCODE_NOP; >continue; ...here you NOP the instruction, without considering whether the flag value is live. I think you meant to change the (inst->writes_accumulator check to be if (inst->writes_accumulator || inst->writes_flags()) that way, you just make the destination NULL, but leave it generating the flag register... > } > } > } > > + if (inst->dst.is_null() && inst->writes_flag()) { > +if (!BITSET_TEST(flag_live, inst->flag_subreg)) { > + inst->opcode = BRW_OPCODE_NOP; > + progress = true; > + continue; > +} > + } ...which this code block would clean up, NOP'ing instructions with a NULL destination and unused flag value. With that fixed, it looks good to me. Reviewed-by: Kenneth Graunke > + > > if (inst->dst.file == GRF) { > if (!inst->is_partial_write()) { > int var = live_intervals->var_from_reg(&inst->dst); > for (int i = 0; i < inst->regs_written; i++) { >BITSET_CLEAR(live, var + i); > } > } > } > > + if (inst->writes_flag()) { > +BITSET_CLEAR(flag_live, inst->flag_subreg); > + } > + > > for (int i = 0; i < inst->sources; i++) { > if (inst->src[i].file == GRF) { > int var = live_intervals->var_from_reg(&inst->src[i]); > for (int j = 0; j < inst->regs_read(this, i); j++) { >BITSET_SET(live, var + j); > } > } > } > > + > + if (inst->reads_flag()) { > +BITSET_SET(flag_live, inst->flag_subreg); > + } >} > } > > ralloc_free(live); > + ralloc_free(flag_live); > > if (progress) { >foreach_block_and_inst_safe (block, backend_instruction, inst, cfg) { > if (inst->opcode == BR
Re: [Mesa-dev] [PATCH 3/5] i965/fs: Track liveness of the flag register.
On Wednesday, October 29, 2014 03:58:13 PM Matt Turner wrote: > On Wed, Oct 29, 2014 at 2:10 PM, Matt Turner wrote: > > --- > > .../drivers/dri/i965/brw_fs_live_variables.cpp | 35 ++ > > src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 5 > > 2 files changed, 40 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > > index ab81e94..dbe1d34 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > > @@ -157,6 +157,18 @@ fs_live_variables::setup_def_use() > > reg.reg_offset++; > > } > > } > > + if (inst->reads_flag()) { > > +/* The vertical combination predicates read f0.0 and f0.1. */ > > +if (inst->predicate == BRW_PREDICATE_ALIGN1_ANYV || > > +inst->predicate == BRW_PREDICATE_ALIGN1_ALLV) { > > + if (!BITSET_TEST(bd->flag_def, 1 - inst->flag_subreg)) { > > + BITSET_SET(bd->flag_use, 1 - inst->flag_subreg); > > Since don't expect (+f0.1.allv) to work (i.e., vertical predicates > with a subregister of 1), maybe I should just assert(inst->flag_subreg > == 0) and then do BITSET_*(..., 1) here. I don't know if (+f0.1.allv) works or not, but it certainly seems easy enough to generate (+f0.0.allv) instead. I like your assert and BITSET_*(..., 1) plan. Either way, patches 1-3 are: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] i965/fs: Use const fs_reg & rather than a copy or pointer.
On Wednesday, October 29, 2014 02:10:13 PM Matt Turner wrote: > Also while we're touching var_from_reg, just make it an inline function. > --- > src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 8 > src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 14 -- > src/mesa/drivers/dri/i965/brw_fs_live_variables.h | 11 --- > src/mesa/drivers/dri/i965/brw_fs_saturate_propagation.cpp | 2 +- > 4 files changed, 17 insertions(+), 18 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > index 414c4a0..2b26177 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > @@ -56,10 +56,10 @@ fs_visitor::dead_code_eliminate() > bool result_live = false; > > if (inst->regs_written == 1) { > - int var = live_intervals->var_from_reg(&inst->dst); > + int var = live_intervals->var_from_reg(inst->dst); > result_live = BITSET_TEST(live, var); > } else { > - int var = live_intervals->var_from_reg(&inst->dst); > + int var = live_intervals->var_from_reg(inst->dst); > for (int i = 0; i < inst->regs_written; i++) { >result_live = result_live || BITSET_TEST(live, var + i); > } > @@ -86,7 +86,7 @@ fs_visitor::dead_code_eliminate() > > if (inst->dst.file == GRF) { > if (!inst->is_partial_write()) { > - int var = live_intervals->var_from_reg(&inst->dst); > + int var = live_intervals->var_from_reg(inst->dst); > for (int i = 0; i < inst->regs_written; i++) { >BITSET_CLEAR(live, var + i); > } > @@ -99,7 +99,7 @@ fs_visitor::dead_code_eliminate() > > for (int i = 0; i < inst->sources; i++) { > if (inst->src[i].file == GRF) { > - int var = live_intervals->var_from_reg(&inst->src[i]); > + int var = live_intervals->var_from_reg(inst->src[i]); > > for (int j = 0; j < inst->regs_read(this, i); j++) { >BITSET_SET(live, var + j); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > index dbe1d34..b5c81cc 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp > @@ -54,9 +54,9 @@ using namespace brw; > > void > fs_live_variables::setup_one_read(struct block_data *bd, fs_inst *inst, > - int ip, fs_reg reg) > + int ip, const fs_reg ®) > { > - int var = var_from_reg(®); > + int var = var_from_reg(reg); > assert(var < num_vars); > > /* In most cases, a register can be written over safely by the > @@ -106,9 +106,9 @@ fs_live_variables::setup_one_read(struct block_data *bd, fs_inst *inst, > > void > fs_live_variables::setup_one_write(struct block_data *bd, fs_inst *inst, > - int ip, fs_reg reg) > + int ip, const fs_reg ®) > { > - int var = var_from_reg(®); > + int var = var_from_reg(reg); > assert(var < num_vars); > > start[var] = MIN2(start[var], ip); > @@ -272,12 +272,6 @@ fs_live_variables::compute_start_end() > } > } > > -int > -fs_live_variables::var_from_reg(fs_reg *reg) > -{ > - return var_from_vgrf[reg->reg] + reg->reg_offset; > -} > - > fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg) > : v(v), cfg(cfg) > { > diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > index 2bfb583..a52f922 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.h > @@ -66,7 +66,10 @@ public: > ~fs_live_variables(); > > bool vars_interfere(int a, int b); > - int var_from_reg(fs_reg *reg); > + int var_from_reg(const fs_reg ®) const > + { > + return var_from_vgrf[reg.reg] + reg.reg_offset; > + } > > /** Map from virtual GRF number to index in block_data arrays. */ > int *var_from_vgrf; > @@ -96,8 +99,10 @@ public: > > protected: > void setup_def_use(); > - void setup_one_read(struct block_data *bd, fs_inst *inst, int ip, fs_reg reg); > - void setup_one_write(struct block_data *bd, fs_inst *inst, int ip, fs_reg reg); > + void setup_one_read(struct block_data *bd, fs_inst *inst, int ip, > + const fs_reg ®); > + void setup_one_write(struct block_data *bd, fs_inst *inst, int ip, > +const fs_reg ®); > void compute_live_variables(); > void compute_start_end(); > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_saturat
Re: [Mesa-dev] [PATCH 6/5] i965/fs: Remove opt_drop_redundant_mov_to_flags().
On Monday, November 03, 2014 11:58:06 AM Matt Turner wrote: > Dead code elimination now handles this. > --- > Depends on the previously sent 5 patch series. Nice! Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/vec4: Track liveness of the flag register.
On Monday, November 03, 2014 01:34:48 PM Matt Turner wrote: > --- > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 28 ++ > .../drivers/dri/i965/brw_vec4_live_variables.h | 5 > 2 files changed, 33 insertions(+) Patch 1 is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965/vec4: Rewrite dead code elimination to use live in/out.
On Monday, November 03, 2014 01:34:49 PM Matt Turner wrote: > Improves 359 shaders by >=10% > 114 shaders by >=20% > 91 shaders by >=30% > 82 shaders by >=40% > 22 shaders by >=50% >4 shaders by >=60% >2 shaders by >=80% > > total instructions in shared programs: 5505182 -> 5482260 (-0.42%) > instructions in affected programs: 364629 -> 341707 (-6.29%) > --- > src/mesa/drivers/dri/i965/Makefile.sources | 1 + > src/mesa/drivers/dri/i965/brw_vec4.cpp | 155 --- > .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 169 > + > 3 files changed, 170 insertions(+), 155 deletions(-) > create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp > > diff --git a/src/mesa/drivers/dri/i965/Makefile.sources > b/src/mesa/drivers/dri/i965/Makefile.sources > index 711aabe..10be4f1 100644 > --- a/src/mesa/drivers/dri/i965/Makefile.sources > +++ b/src/mesa/drivers/dri/i965/Makefile.sources > @@ -102,6 +102,7 @@ i965_FILES = \ > brw_vec4.cpp \ > brw_vec4_copy_propagation.cpp \ > brw_vec4_cse.cpp \ > + brw_vec4_dead_code_eliminate.cpp \ > brw_vec4_generator.cpp \ > brw_vec4_gs_visitor.cpp \ > brw_vec4_live_variables.cpp \ > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index df589b8..6560351 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -411,161 +411,6 @@ vec4_visitor::opt_reduce_swizzle() > return progress; > } > > -static bool > -try_eliminate_instruction(vec4_instruction *inst, int new_writemask, > - const struct brw_context *brw) > -{ > - if (inst->has_side_effects()) > - return false; > - > - if (new_writemask == 0) { > - /* Don't dead code eliminate instructions that write to the > - * accumulator as a side-effect. Instead just set the destination > - * to the null register to free it. > - */ > - if (inst->writes_accumulator || inst->writes_flag()) { > - inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type)); > - } else { > - inst->opcode = BRW_OPCODE_NOP; > - } > - > - return true; > - } else if (inst->dst.writemask != new_writemask) { > - switch (inst->opcode) { > - case SHADER_OPCODE_TXF_CMS: > - case SHADER_OPCODE_GEN4_SCRATCH_READ: > - case VS_OPCODE_PULL_CONSTANT_LOAD: > - case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7: > - break; > - default: > - /* Do not set a writemask on Gen6 for math instructions, those are > - * executed using align1 mode that does not support a destination > mask. > - */ > - if (!(brw->gen == 6 && inst->is_math()) && !inst->is_tex()) { > -inst->dst.writemask = new_writemask; > -return true; > - } > - } > - } > - > - return false; > -} > - > -/** > - * Must be called after calculate_live_intervals() to remove unused > - * writes to registers -- register allocation will fail otherwise > - * because something deffed but not used won't be considered to > - * interfere with other regs. > - */ > -bool > -vec4_visitor::dead_code_eliminate() > -{ > - bool progress = false; > - int pc = -1; > - > - calculate_live_intervals(); > - > - foreach_block_and_inst(block, vec4_instruction, inst, cfg) { > - pc++; > - > - bool inst_writes_flag = false; > - if (inst->dst.file != GRF) { > - if (inst->dst.is_null() && inst->writes_flag()) { > -inst_writes_flag = true; > - } else { > -continue; > - } > - } > - > - if (inst->dst.file == GRF) { > - int write_mask = inst->dst.writemask; > - > - for (int c = 0; c < 4; c++) { > -if (write_mask & (1 << c)) { > - assert(this->virtual_grf_end[inst->dst.reg * 4 + c] >= pc); > - if (this->virtual_grf_end[inst->dst.reg * 4 + c] == pc) { > - write_mask &= ~(1 << c); > - } > -} > - } > - > - progress = try_eliminate_instruction(inst, write_mask, brw) || > -progress; > - } > - > - if (inst->predicate || inst->prev == NULL) > - continue; > - > - int dead_channels; > - if (inst_writes_flag) { > -/* Arbitrarily chosen, other than not being an xyzw writemask. */ > -#define FLAG_WRITEMASK (1 << 5) > - dead_channels = inst->reads_flag() ? 0 : FLAG_WRITEMASK; > - } else { > - dead_channels = inst->dst.writemask; > - > - for (int i = 0; i < 3; i++) { > -if (inst->src[i].file != GRF || > -inst->src[i].reg != inst->dst.reg) > - continue; > - > -for (int j = 0; j < 4; j++) { > - int swiz = BRW_GET_SWZ(inst->src[i].swizzle, j); > -
Re: [Mesa-dev] [PATCH 2/3] mesa: helper macros to enable per function optimisations
On Sun, 2014-11-09 at 07:48 +0330, Siavash Eliasi wrote: > I know that's a time saver for developer (gcc function multi > versioning), however I still do prefer the approach (my own ^^ ) which > works on all setups regardless of hardware and compiler (well, any sane > compiler ICC, GCC, Clang,...). > > Best regards, > Siavash Eliasi. This isn't about saving development time, and the whole point of it is so things work *well* on all compilers. In an ideal world we wouldn't have to do compiler specific things, but in the real world if you want Mesa to be fast this is the right direction to go in. To look at it another way putting different versions of functions in another file is a hack too. At least this way we can have link time optimisations, inlining improvements, etc. I think you may not be following what this patch is doing. This patch is not an alternative to your patch, it works along side it. Your patch is about runtime selection this patch is about compile time. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev