Re: [Mesa-dev] [PATCH] android: radeon(s): fix libdrm_amdgpu shared dependencies
2017-05-22 1:34 GMT+02:00 Mauro Rossi : > > > 2017-05-21 18:27 GMT+02:00 Emil Velikov : > >> Hi Mauro, >> >> There is a similar issue when building with autotools. There's a few >> ways to address this so let's see what the devs prefer. >> >> Another temporary workaround is to build radeonsi alongside the other >> radeon drivers. >> >> -Emil >> > > Just FYI, I am already building radeonsi (target libmesa_pipe_radeonsi) > Mauro > ...continuing the sentence so building radeonsi (even if not working because llvm 3.8 in nougat) is not a workaround the r% drivers building errors. After commit 44b29dd "amd/common: add missing libdrm include path", is there an alternative proposed solution for android building errors that compares with submitted patch? Mauro ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 101211] Mesa swrast issue with visualization on BE PPC PPC64
https://bugs.freedesktop.org/show_bug.cgi?id=101211 Bug ID: 101211 Summary: Mesa swrast issue with visualization on BE PPC PPC64 Product: Mesa Version: 17.1 Hardware: PowerPC OS: Linux (All) Status: NEW Severity: blocker Priority: medium Component: Drivers/Gallium/swr Assignee: mesa-dev@lists.freedesktop.org Reporter: intermedi...@hotmail.com QA Contact: mesa-dev@lists.freedesktop.org Created attachment 131538 --> https://bugs.freedesktop.org/attachment.cgi?id=131538&action=edit mad glxgears Hi, from version 17 there is a bad visualization in swrast in BE hardware i have totally broken gfx or mad visualization. it made system not usable if is i n use gdm3 (eg lubuntu, ubuntu mate ppc32 distro) I been test it with official mesa from fedora server 25 PPC64 and on my self build mesa with same result. i attached some example My glxinfo on Qoriq P50xx processor (i have the same on ibm 970MP machine) Extended renderer info (GLX_MESA_query_renderer): Vendor: VMware, Inc. (0x) Device: llvmpipe (LLVM 3.9, 128 bits) (0x) Version: 17.2.0 Accelerated: no Video memory: 16043MB Unified memory: no Preferred profile: core (0x1) Max core profile version: 3.3 Max compat profile version: 3.0 Max GLES1 profile version: 1.1 Max GLES[23] profile version: 3.0 OpenGL vendor string: VMware, Inc. OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.9, 128 bits) OpenGL core profile version string: 3.3 (Core Profile) Mesa 17.2.0-devel (git-fe43788) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 101211] Mesa swrast issue with visualization on BE PPC PPC64
https://bugs.freedesktop.org/show_bug.cgi?id=101211 --- Comment #1 from intermedi...@hotmail.com --- Created attachment 131539 --> https://bugs.freedesktop.org/attachment.cgi?id=131539&action=edit sdl display -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 101211] Mesa swrast issue with visualization on BE PPC PPC64
https://bugs.freedesktop.org/show_bug.cgi?id=101211 --- Comment #2 from intermedi...@hotmail.com --- Created attachment 131540 --> https://bugs.freedesktop.org/attachment.cgi?id=131540&action=edit darkplace quake -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 101211] Mesa swrast issue with visualization on BE PPC PPC64
https://bugs.freedesktop.org/show_bug.cgi?id=101211 --- Comment #3 from intermedi...@hotmail.com --- Created attachment 131541 --> https://bugs.freedesktop.org/attachment.cgi?id=131541&action=edit glx tunnel example -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/24] swr/rast: remove extra pixel center adjustment in BinPostSetupPoints
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 5 - 1 file changed, 5 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index 4c6a5b1..61b3b66 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -1528,11 +1528,6 @@ void BinPostSetupPoints( PFN_PROCESS_ATTRIBUTES pfnProcessAttribs = GetProcessAttributesFunc(1, state.backendState.swizzleEnable, state.backendState.constantInterpolationMask); -// adjust for pixel center location -simdscalar offset = g_pixelOffsets[rastState.pixelLocation]; -primVerts.x = _simd_add_ps(primVerts.x, offset); -primVerts.y = _simd_add_ps(primVerts.y, offset); - // convert to fixed point simdscalari vXi, vYi; vXi = fpToFixedPointVertical(primVerts.x); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/24] swr/rast: clean up whitespace
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index a780dfc..f28981b 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -2192,7 +2192,6 @@ void BinPostSetupPoints_simd16( AR_END(FEBinPoints, 1); } - void SIMDAPI BinPoints_simd16( DRAW_CONTEXT *pDC, PA_STATE& pa, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 22/24] swr/rast: code cleanup (no functional change)
--- src/gallium/drivers/swr/rasterizer/core/clip.h | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h b/src/gallium/drivers/swr/rasterizer/core/clip.h index ad2745b..3e8ea33 100644 --- a/src/gallium/drivers/swr/rasterizer/core/clip.h +++ b/src/gallium/drivers/swr/rasterizer/core/clip.h @@ -947,8 +947,9 @@ public: // execute the clipper stage void ExecuteStage(PA_STATE& pa, simdvector prim[], uint32_t primMask, simdscalari primId, simdscalari viewportIdx) { -SWR_ASSERT(pa.pDC != nullptr); -SWR_CONTEXT* pContext = pa.pDC->pContext; +SWR_ASSERT(this->pDC != nullptr); +SWR_CONTEXT* pContext = this->pDC->pContext; +const API_STATE& apiState = this->pDC->pState->state; // set up binner based on PA state PFN_PROCESS_PRIMS pfnBinner; @@ -965,7 +966,7 @@ public: pfnBinner = BinLines; break; default: -pfnBinner = GetBinTrianglesFunc((pa.pDC->pState->state.rastState.conservativeRast > 0)); +pfnBinner = GetBinTrianglesFunc((apiState.rastState.conservativeRast > 0)); break; }; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/24] swr: update rasterizer
Highlights include lots of simd16 progress and code cleanups. No regressions on piglit or vtk ctest. Tim Rowley (24): swr/rast: remove extra pixel center adjustment in BinPostSetupPoints swr/rast: adjust BinPostSetupPoints* function signature swr/rast: clean up whitespace swr/rast: add support for DX1_RGB{_SRGB} formats swr/rast: add CreateDirectoryPath to recursively create directories swr/rast: silence write of cfg graph swr/rast: SIMD16 FE - primitive assembly simplification swr/rast: fix _simd16_movemask_(ps,pd) native AVX512 intrinsics swr/rast: SIMD16 FE - interleaved simdvertex output in GS swr/rast: SIMD16 FE - fix conservative rasterization swr/rast: SIMD16 FE - simplify/refactor StreamOut swr/rast: SIMD16 FE - fix PA_STATE_OP::Reset() swr/rast: SIMD16 FE - add SIMD16 types to jitter swr/rast: make simd16 logicops avx512f safe swr/rast: add renderTargetArrayIndex to SWR_PS_CONTEXT swr/rast: SIMD16 FE - fix/use SIMD16 calcDeterminantIntVertical() swr/rast: move binner utility functions to binner.h swr/rast: code cleanup (no functional change) swr/rast: remove unused functions swr/rast: move wireframe/point triangle binning after culling swr/rast: allow early-z if shader uses depth value swr/rast: code cleanup (no functional change) swr/rast: whitespace changes swr/rast: code cleanup (no functional change) src/gallium/drivers/swr/Makefile.sources | 1 + .../swr/rasterizer/codegen/gen_llvm_types.py | 4 +- .../drivers/swr/rasterizer/common/formats.cpp | 80 ++--- .../drivers/swr/rasterizer/common/formats.h| 4 +- src/gallium/drivers/swr/rasterizer/common/os.cpp | 48 ++- src/gallium/drivers/swr/rasterizer/common/os.h | 3 +- .../drivers/swr/rasterizer/common/simd16intrin.h | 44 ++- .../drivers/swr/rasterizer/common/simdintrin.h | 24 ++ src/gallium/drivers/swr/rasterizer/core/api.cpp| 10 +- src/gallium/drivers/swr/rasterizer/core/backend.h | 1 + src/gallium/drivers/swr/rasterizer/core/binner.cpp | 334 ++--- src/gallium/drivers/swr/rasterizer/core/binner.h | 223 ++ src/gallium/drivers/swr/rasterizer/core/clip.h | 7 +- src/gallium/drivers/swr/rasterizer/core/context.h | 2 - .../drivers/swr/rasterizer/core/format_traits.h| 46 ++- .../drivers/swr/rasterizer/core/frontend.cpp | 64 +--- src/gallium/drivers/swr/rasterizer/core/frontend.h | 98 ++ src/gallium/drivers/swr/rasterizer/core/pa.h | 32 +- src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp | 53 +--- src/gallium/drivers/swr/rasterizer/core/state.h| 20 +- .../drivers/swr/rasterizer/jitter/JitManager.cpp | 16 +- src/gallium/drivers/swr/swr_shader.cpp | 29 +- 22 files changed, 629 insertions(+), 514 deletions(-) create mode 100644 src/gallium/drivers/swr/rasterizer/core/binner.h -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 19/24] swr/rast: remove unused functions
--- src/gallium/drivers/swr/rasterizer/core/frontend.h | 28 -- 1 file changed, 28 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.h b/src/gallium/drivers/swr/rasterizer/core/frontend.h index 9f347e1..a2ce3a1 100644 --- a/src/gallium/drivers/swr/rasterizer/core/frontend.h +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.h @@ -63,21 +63,6 @@ void triangleSetupAB(const __m128 vX, const __m128 vY, __m128 & vA, __m128 & vB) } INLINE -void triangleSetupABVertical(const simdscalar vX[3], const simdscalar vY[3], simdscalar (&vA)[3], simdscalar (&vB)[3]) -{ -// generate edge equations -// A = y0 - y1 -// B = x1 - x0 -vA[0] = _simd_sub_ps(vY[0], vY[1]); -vA[1] = _simd_sub_ps(vY[1], vY[2]); -vA[2] = _simd_sub_ps(vY[2], vY[0]); - -vB[0] = _simd_sub_ps(vX[1], vX[0]); -vB[1] = _simd_sub_ps(vX[2], vX[1]); -vB[2] = _simd_sub_ps(vX[0], vX[2]); -} - -INLINE void triangleSetupABInt(const __m128i vX, const __m128i vY, __m128i & vA, __m128i & vB) { // generate edge equations @@ -239,19 +224,6 @@ void triangleSetupC(const __m128 vX, const __m128 vY, const __m128 vA, const __m vC = _mm_sub_ps(vC, vCy); } -INLINE -void viewportTransform(__m128 &vX, __m128 &vY, __m128 &vZ, const SWR_VIEWPORT_MATRIX &vpMatrix) -{ -vX = _mm_mul_ps(vX, _mm_set1_ps(vpMatrix.m00)); -vX = _mm_add_ps(vX, _mm_set1_ps(vpMatrix.m30)); - -vY = _mm_mul_ps(vY, _mm_set1_ps(vpMatrix.m11)); -vY = _mm_add_ps(vY, _mm_set1_ps(vpMatrix.m31)); - -vZ = _mm_mul_ps(vZ, _mm_set1_ps(vpMatrix.m22)); -vZ = _mm_add_ps(vZ, _mm_set1_ps(vpMatrix.m32)); -} - template INLINE void viewportTransform(simdvector *v, const SWR_VIEWPORT_MATRICES & vpMatrices) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/24] swr/rast: SIMD16 FE - simplify/refactor StreamOut
--- .../drivers/swr/rasterizer/core/frontend.cpp | 42 -- 1 file changed, 42 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp index e88246f..dfbbc58 100644 --- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp @@ -495,9 +495,6 @@ static void StreamOut( PA_STATE& pa, uint32_t workerId, uint32_t* pPrimData, -#if USE_SIMD16_FRONTEND -uint32_t numPrims_simd8, -#endif uint32_t streamIndex) { SWR_CONTEXT *pContext = pDC->pContext; @@ -520,11 +517,7 @@ static void StreamOut( soContext.pBuffer[i] = &state.soBuffer[i]; } -#if USE_SIMD16_FRONTEND -uint32_t numPrims = numPrims_simd8; -#else uint32_t numPrims = pa.NumPrims(); -#endif for (uint32_t primIndex = 0; primIndex < numPrims; ++primIndex) { @@ -948,22 +941,8 @@ static void GeometryShaderStage( if (HasStreamOutT::value) { -#if USE_SIMD16_FRONTEND -const uint32_t numPrims = gsPa.NumPrims(); -const uint32_t numPrims_lo = std::min(numPrims, KNOB_SIMD_WIDTH); -const uint32_t numPrims_hi = std::max(numPrims, KNOB_SIMD_WIDTH) - KNOB_SIMD_WIDTH; - gsPa.useAlternateOffset = false; -StreamOut(pDC, gsPa, workerId, pSoPrimData, numPrims_lo, stream); - -if (numPrims_hi) -{ -gsPa.useAlternateOffset = true; -StreamOut(pDC, gsPa, workerId, pSoPrimData, numPrims_hi, stream); -} -#else StreamOut(pDC, gsPa, workerId, pSoPrimData, stream); -#endif } if (HasRastT::value && state.soState.streamToRasterizer == stream) @@ -1360,18 +1339,8 @@ static void TessellationStages( { if (HasStreamOutT::value) { -#if USE_SIMD16_FRONTEND tessPa.useAlternateOffset = false; -StreamOut(pDC, tessPa, workerId, pSoPrimData, numPrims_lo, 0); - -if (numPrims_hi) -{ -tessPa.useAlternateOffset = true; -StreamOut(pDC, tessPa, workerId, pSoPrimData, numPrims_hi, 0); -} -#else StreamOut(pDC, tessPa, workerId, pSoPrimData, 0); -#endif } if (HasRastT::value) @@ -1747,19 +1716,8 @@ void ProcessDraw( // If streamout is enabled then stream vertices out to memory. if (HasStreamOutT::value) { -#if 1 -pa.useAlternateOffset = false; -StreamOut(pDC, pa, workerId, pSoPrimData, numPrims_lo, 0); - -if (numPrims_hi) -{ -pa.useAlternateOffset = true; -StreamOut(pDC, pa, workerId, pSoPrimData, numPrims_hi, 0); -} -#else pa.useAlternateOffset = false; StreamOut(pDC, pa, workerId, pSoPrimData, 0); -#endif } if (HasRastT::value) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/24] swr/rast: adjust BinPostSetupPoints* function signature
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index 61b3b66..a780dfc 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -40,7 +40,7 @@ void BinPostSetupPoints(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, simd #if USE_SIMD16_FRONTEND void BinPostSetupLines_simd16(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, simd16vector prims[3], simd16scalar vRecipW[2], uint32_t primMask, simd16scalari primID, simd16scalari viewportIdx); -void BinPostSetupPoints_simd16(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, simdvector prims[], uint32_t primMask, simdscalari primID, simdscalari viewportIdx); +void BinPostSetupPoints_simd16(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, simd16vector prims[], uint32_t primMask, simd16scalari primID, simd16scalari viewportIdx); #endif // @@ -1508,7 +1508,7 @@ void BinPostSetupPoints( DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, -simdvector prim[3], +simdvector prim[], uint32_t primMask, simdscalari primID, simdscalari viewportIdx) @@ -1876,7 +1876,7 @@ void BinPostSetupPoints_simd16( DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, -simd16vector prim[3], +simd16vector prim[], uint32_t primMask, simd16scalari primID, simd16scalari viewportIdx) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/24] swr/rast: SIMD16 FE - primitive assembly simplification
Reduce/simplify vertex storage usage in PA_STATE_OPT, fix PA GetNextVSOutput wrap-around behaviour and eliminate unnecessary SIMDVERTEX copies/storage for tri fan in PA_STATE_OPT Fixes the OpenGL tri fan test failure under SIMD16 - triangle-rasterization-overdraw. --- src/gallium/drivers/swr/rasterizer/core/pa.h | 29 src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp | 53 +- 2 files changed, 32 insertions(+), 50 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/pa.h b/src/gallium/drivers/swr/rasterizer/core/pa.h index 403efe0..7c39056 100644 --- a/src/gallium/drivers/swr/rasterizer/core/pa.h +++ b/src/gallium/drivers/swr/rasterizer/core/pa.h @@ -119,8 +119,6 @@ struct PA_STATE // cuts struct PA_STATE_OPT : public PA_STATE { -SIMDVERTEX leadingVertex;// For tri-fan - uint32_t numPrims{ 0 }; // Total number of primitives for draw. uint32_t numPrimsComplete{ 0 }; // Total number of complete primitives. @@ -128,7 +126,7 @@ struct PA_STATE_OPT : public PA_STATE uint32_t cur{ 0 }; // index to current VS output. uint32_t prev{ 0 }; // index to prev VS output. Not really needed in the state. -uint32_t first{ 0 }; // index to first VS output. Used for trifan. +const uint32_t first{ 0 }; // index to first VS output. Used for tri fan and line loop. uint32_t counter{ 0 }; // state counter bool reset{ false }; // reset state @@ -245,13 +243,27 @@ struct PA_STATE_OPT : public PA_STATE SIMDVERTEX& GetNextVsOutput() { +const uint32_t numSimdVerts = streamSizeInVerts / SIMD_WIDTH; + // increment cur and prev indices -const uint32_t numSimdVerts = this->streamSizeInVerts / SIMD_WIDTH; -this->prev = this->cur; // prev is undefined for first state. -this->cur = this->counter % numSimdVerts; +if (counter < numSimdVerts) +{ +// prev undefined for first state +prev = cur; +cur = counter; +} +else +{ +// swap/recycle last two simd verts for prev and cur, leave other simd verts intact in the buffer +uint32_t temp = prev; + +prev = cur; +cur = temp; +} + +SWR_ASSERT(cur < numSimdVerts); -SIMDVERTEX* pVertex = (SIMDVERTEX*)pStreamBase; -return pVertex[this->cur]; +return reinterpret_cast(pStreamBase)[cur]; } SIMDMASK& GetNextVsIndices() @@ -317,7 +329,6 @@ struct PA_STATE_OPT : public PA_STATE this->numSimdPrims = 0; this->cur = 0; this->prev = 0; -this->first = 0; this->counter = 0; this->reset = false; } diff --git a/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp b/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp index d0ee18a..897079c 100644 --- a/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp @@ -1213,10 +1213,6 @@ void PaTriStripSingle0(PA_STATE_OPT& pa, uint32_t slot, uint32_t primIndex, __m1 bool PaTriFan0(PA_STATE_OPT& pa, uint32_t slot, simdvector verts[]) { -// store off leading vertex for attributes -PA_STATE_OPT::SIMDVERTEX* pVertex = (PA_STATE_OPT::SIMDVERTEX*)pa.pStreamBase; -pa.leadingVertex = pVertex[pa.cur]; - SetNextPaState(pa, PaTriFan1, PaTriFanSingle0); return false;// Not enough vertices to assemble 8 triangles. } @@ -1228,11 +1224,7 @@ bool PaTriFan1(PA_STATE_OPT& pa, uint32_t slot, simdvector verts[]) simdvector a; simdvector b; -#if 1 const simd16vector &leadvert_16 = PaGetSimdVector_simd16(pa, pa.first, slot); -#else -const simd16vector &leadvert_16 = pa.leadingVertex.attrib[slot]; -#endif if (!pa.useAlternateOffset) { @@ -1260,10 +1252,9 @@ bool PaTriFan1(PA_STATE_OPT& pa, uint32_t slot, simdvector verts[]) } #else -simdvector &leadVert = pa.leadingVertex.attrib[slot]; - -simdvector &a = PaGetSimdVector(pa, pa.prev, slot); -simdvector &b = PaGetSimdVector(pa, pa.cur, slot); +const simdvector &leadVert = PaGetSimdVector(pa, pa.first, slot); +const simdvector &a = PaGetSimdVector(pa, pa.prev, slot); +const simdvector &b = PaGetSimdVector(pa, pa.cur, slot); #endif simdscalar s; @@ -1301,23 +1292,7 @@ bool PaTriFan0_simd16(PA_STATE_OPT& pa, uint32_t slot, simd16vector verts[]) bool PaTriFan1_simd16(PA_STATE_OPT& pa, uint32_t slot, simd16vector verts[]) { -#if USE_SIMD16_FRONTEND -#if 1 const simd16vector &a = PaGetSimdVector_simd16(pa, pa.first, slot); -#else -const simd16vector &a = pa.leadingVertex.attrib[slot]; -#endif -#else -simd16vector a; - -{ -for (uint32_t i = 0; i < 4; i += 1) -{ -a[i] = _simd16_insert_ps(_simd16_setzero_ps(), pa.leadingVertex.attrib[slot][i
[Mesa-dev] [PATCH 18/24] swr/rast: code cleanup (no functional change)
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 124 +++-- 1 file changed, 64 insertions(+), 60 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index a3a3288..4667b48 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -561,40 +561,43 @@ void BinTriangles( RDTSC_EVENT(FECullZeroAreaAndBackface, _mm_popcnt_u32(origTriMask ^ triMask), 0); } -// Simple non-conformant wireframe mode, useful for debugging -if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) -{ -// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD -simdvector line[2]; -simdscalar recipW[2]; -line[0] = tri[0]; -line[1] = tri[1]; -recipW[0] = vRecipW0; -recipW[1] = vRecipW1; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[1]; -line[1] = tri[2]; -recipW[0] = vRecipW1; -recipW[1] = vRecipW2; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[2]; -line[1] = tri[0]; -recipW[0] = vRecipW2; -recipW[1] = vRecipW0; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -AR_END(FEBinTriangles, 1); -return; -} else if (rastState.fillMode == SWR_FILLMODE_POINT) -{ -// bin 3 points - -BinPostSetupPoints(pDC, pa, workerId, &tri[0], triMask, primID, viewportIdx); -BinPostSetupPoints(pDC, pa, workerId, &tri[1], triMask, primID, viewportIdx); -BinPostSetupPoints(pDC, pa, workerId, &tri[2], triMask, primID, viewportIdx); -return; +{ +// Simple non-conformant wireframe mode, useful for debugging +if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) +{ +// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD +simdvector line[2]; +simdscalar recipW[2]; +line[0] = tri[0]; +line[1] = tri[1]; +recipW[0] = vRecipW0; +recipW[1] = vRecipW1; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +line[0] = tri[1]; +line[1] = tri[2]; +recipW[0] = vRecipW1; +recipW[1] = vRecipW2; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +line[0] = tri[2]; +line[1] = tri[0]; +recipW[0] = vRecipW2; +recipW[1] = vRecipW0; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +AR_END(FEBinTriangles, 1); +return; +} +else if (rastState.fillMode == SWR_FILLMODE_POINT) +{ +// bin 3 points + +BinPostSetupPoints(pDC, pa, workerId, &tri[0], triMask, primID, viewportIdx); +BinPostSetupPoints(pDC, pa, workerId, &tri[1], triMask, primID, viewportIdx); +BinPostSetupPoints(pDC, pa, workerId, &tri[2], triMask, primID, viewportIdx); +return; +} } /// Note: these variable initializations must stay above any 'goto endBenTriangles' @@ -994,32 +997,34 @@ void SIMDAPI BinTriangles_simd16( RDTSC_EVENT(FECullZeroAreaAndBackface, _mm_popcnt_u32(origTriMask ^ triMask), 0); } -// Simple non-conformant wireframe mode, useful for debugging -if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) { -// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD -simd16vector line[2]; -simd16scalar recipW[2]; -line[0] = tri[0]; -line[1] = tri[1]; -recipW[0] = vRecipW0; -recipW[1] = vRecipW1; -BinPostSetupLines_simd16(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[1]; -line[1] = tri[2]; -recipW[0] = vRecipW1; -recipW[1] = vRecipW2; -BinPostSetupLines_simd16(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[2]; -line[1] = tri[0]; -recipW[0] = vRecipW2; -recipW[1] = vRecipW0; -BinPostSetupLines_simd16(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -AR_END(FEBinTriangles, 1); -return; +// Simple non-conformant wireframe mode, useful for debugging +if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) +{ +// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD +simd16vector line[2]; +simd16scalar recipW[2]; +line[0] = tri[0]; +line[1] = tri[1]; +
[Mesa-dev] [PATCH 04/24] swr/rast: add support for DX1_RGB{_SRGB} formats
--- .../drivers/swr/rasterizer/common/formats.cpp | 80 -- .../drivers/swr/rasterizer/common/formats.h| 4 +- .../drivers/swr/rasterizer/core/format_traits.h| 46 - 3 files changed, 93 insertions(+), 37 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/common/formats.cpp b/src/gallium/drivers/swr/rasterizer/common/formats.cpp index 72020ee..263dec6 100644 --- a/src/gallium/drivers/swr/rasterizer/common/formats.cpp +++ b/src/gallium/drivers/swr/rasterizer/common/formats.cpp @@ -20,7 +20,7 @@ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * -* @file gen_formats.cpp +* @file formats.cpp * * @brief auto-generated file * @@ -2729,26 +2729,16 @@ const SWR_FORMAT_INFO gFormatInfo[] = { { 0.0f, 0.0f, 0.0f, 0.0f }, 1, 1 }, -// R10G10B10_FLOAT_A2_UNORM (0xD5) +// padding (0xD5) { -"R10G10B10_FLOAT_A2_UNORM", -{ SWR_TYPE_FLOAT, SWR_TYPE_FLOAT, SWR_TYPE_FLOAT, SWR_TYPE_FLOAT }, -{ 0, 0, 0, 0x3f80 }, // Defaults for missing components -{ 0, 1, 2, 3 }, // Swizzle -{ 10, 10, 10, 2 }, // Bits per component -32, // Bits per element -4, // Bytes per element -4, // Num components -false, // isSRGB -false, // isBC -false, // isSubsampled -false, // isLuminance -{ false, false, false, false }, // Is normalized? -{ 1.0f, 1.0f, 1.0f, 1.0f }, // To float scale factor -1, // bcWidth -1, // bcHeight +nullptr, +{ SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN }, +{ 0, 0, 0, 0 },{ 0, 0, 0, 0 },{ 0, 0, 0, 0 }, +0, 0, 0, false, false, false, false, +{ false, false, false, false }, +{ 0.0f, 0.0f, 0.0f, 0.0f }, +1, 1 }, - // R32_SINT (0xD6) { "R32_SINT", @@ -5179,16 +5169,26 @@ const SWR_FORMAT_INFO gFormatInfo[] = { { 0.0f, 0.0f, 0.0f, 0.0f }, 1, 1 }, -// padding (0x180) +// DXT1_RGB_SRGB (0x180) { -nullptr, -{ SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN }, -{ 0, 0, 0, 0 },{ 0, 0, 0, 0 },{ 0, 0, 0, 0 }, -0, 0, 0, false, false, false, false, -{ false, false, false, false }, -{ 0.0f, 0.0f, 0.0f, 0.0f }, -1, 1 +"DXT1_RGB_SRGB", +{ SWR_TYPE_UNORM, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN }, +{ 0, 0, 0, 0x3f80 }, // Defaults for missing components +{ 0, 1, 2, 3 }, // Swizzle +{ 8, 8, 8, 8 }, // Bits per component +64, // Bits per element +8, // Bytes per element +1, // Num components +false, // isSRGB +true, // isBC +false, // isSubsampled +false, // isLuminance +{ true, false, false, false }, // Is normalized? +{ 1.0f / 255.0f, 0, 0, 0 }, // To float scale factor +4, // bcWidth +4, // bcHeight }, + // padding (0x181) { nullptr, @@ -5449,16 +5449,26 @@ const SWR_FORMAT_INFO gFormatInfo[] = { { 0.0f, 0.0f, 0.0f, 0.0f }, 1, 1 }, -// padding (0x191) +// DXT1_RGB (0x191) { -nullptr, -{ SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN }, -{ 0, 0, 0, 0 },{ 0, 0, 0, 0 },{ 0, 0, 0, 0 }, -0, 0, 0, false, false, false, false, -{ false, false, false, false }, -{ 0.0f, 0.0f, 0.0f, 0.0f }, -1, 1 +"DXT1_RGB", +{ SWR_TYPE_UNORM, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN, SWR_TYPE_UNKNOWN }, +{ 0, 0, 0, 0x3f80 }, // Defaults for missing components +{ 0, 1, 2, 3 }, // Swizzle +{ 8, 8, 8, 8 }, // Bits per component +64, // Bits per element +8, // Bytes per element +1, // Num components +false, // isSRGB +true, // isBC +false, // isSubsampled +false, // isLuminance +{ true, false, false, false }, // Is normalized? +{ 1.0f / 255.0f, 0, 0, 0 }, // To float scale factor +4, // bcWidth +4, // bcHeight }, + // padding (0x192) { nullptr, diff --git a/src/gallium/drivers/swr/rasterizer/common/formats.h b/src/gallium/drivers/swr/rasterizer/common/formats.h index 0056a56..f13f338 100644 --- a/src/gallium/drivers/swr/rasterizer/common/formats.h +++ b/src/gallium/drivers/swr/rasterizer/common/formats.h @@ -20,7 +20,7 @@ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS * IN THE SOFTWARE. * -* @file gen_formats.h +* @file formats.h * * @brief auto-generated file * @@ -181,6 +181,7 @@ enum SWR_FORMAT L8_SINT = 0x153, I8_UINT = 0x154, I8_SINT = 0x155, +DXT1_RGB_SRGB = 0x180, YCRCB_SWAPUVY = 0x183
[Mesa-dev] [PATCH 10/24] swr/rast: SIMD16 FE - fix conservative rasterization
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 32 ++ 1 file changed, 32 insertions(+) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index f28981b..89a2167 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -204,6 +204,38 @@ INLINE void calcBoundingBoxIntVertical(const simdvector * c bbox.ymax = _simd_add_epi32(vMaxY, _simd_set1_epi32(CT::BoundingBoxOffsetT::value)); } +#if USE_SIMD16_FRONTEND +template <> +INLINE void calcBoundingBoxIntVertical(const simd16vector * const tri, simd16scalari(&vX)[3], simd16scalari(&vY)[3], simd16BBox &bbox) +{ +// FE conservative rast traits +typedef FEConservativeRastT CT; + +simd16scalari vMinX = vX[0]; +vMinX = _simd16_min_epi32(vMinX, vX[1]); +vMinX = _simd16_min_epi32(vMinX, vX[2]); + +simd16scalari vMaxX = vX[0]; +vMaxX = _simd16_max_epi32(vMaxX, vX[1]); +vMaxX = _simd16_max_epi32(vMaxX, vX[2]); + +simd16scalari vMinY = vY[0]; +vMinY = _simd16_min_epi32(vMinY, vY[1]); +vMinY = _simd16_min_epi32(vMinY, vY[2]); + +simd16scalari vMaxY = vY[0]; +vMaxY = _simd16_max_epi32(vMaxY, vY[1]); +vMaxY = _simd16_max_epi32(vMaxY, vY[2]); + +/// Bounding box needs to be expanded by 1/512 before snapping to 16.8 for conservative rasterization +/// expand bbox by 1/256; coverage will be correctly handled in the rasterizer. +bbox.xmin = _simd16_sub_epi32(vMinX, _simd16_set1_epi32(CT::BoundingBoxOffsetT::value)); +bbox.xmax = _simd16_add_epi32(vMaxX, _simd16_set1_epi32(CT::BoundingBoxOffsetT::value)); +bbox.ymin = _simd16_sub_epi32(vMinY, _simd16_set1_epi32(CT::BoundingBoxOffsetT::value)); +bbox.ymax = _simd16_add_epi32(vMaxY, _simd16_set1_epi32(CT::BoundingBoxOffsetT::value)); +} + +#endif // /// @brief Processes attributes for the backend based on linkage mask and ///linkage map. Essentially just doing an SOA->AOS conversion and pack. -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/24] swr/rast: SIMD16 FE - add SIMD16 types to jitter
--- src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py | 4 +++- src/gallium/drivers/swr/rasterizer/core/frontend.h | 8 src/gallium/drivers/swr/rasterizer/core/state.h | 9 - 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py index 4cabde3..c153368 100644 --- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py +++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py @@ -71,7 +71,9 @@ def gen_llvm_type(type, name, is_pointer, is_pointer_pointer, is_array, is_array elif type == 'SIMD16::vectori_t': llvm_type = 'VectorType::get(Type::getInt32Ty(ctx), 16)' elif type == 'simdvector': -llvm_type = 'ArrayType::get(VectorType::get(Type::getFloatTy(ctx), pJitMgr->mVWidth), 4)' +llvm_type = 'ArrayType::get(VectorType::get(Type::getFloatTy(ctx), 8), 4)' +elif type == 'simd16vector': +llvm_type = 'ArrayType::get(VectorType::get(Type::getFloatTy(ctx), 16), 4)' elif type == 'SIMD8::attrib_t': llvm_type = 'ArrayType::get(VectorType::get(Type::getFloatTy(ctx), 8), 4)' elif type == 'SIMD16::attrib_t': diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.h b/src/gallium/drivers/swr/rasterizer/core/frontend.h index 1ce51bb..eedbcfc 100644 --- a/src/gallium/drivers/swr/rasterizer/core/frontend.h +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.h @@ -30,14 +30,6 @@ #include "context.h" #include -#if ENABLE_AVX512_SIMD16 -// TODO: this belongs in state.h alongside the simdvector definition, but there is a llvm codegen issue -struct simd16vertex -{ -simd16vectorattrib[SWR_VTX_NUM_SLOTS]; -}; - -#endif // Calculates the A and B coefficients for the 3 edges of the triangle // // maths for edge equations: diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h b/src/gallium/drivers/swr/rasterizer/core/state.h index bf735e0..8812fba 100644 --- a/src/gallium/drivers/swr/rasterizer/core/state.h +++ b/src/gallium/drivers/swr/rasterizer/core/state.h @@ -197,9 +197,16 @@ enum SWR_VTX_SLOTS // SoAoSoA struct simdvertex { -simdvectorattrib[SWR_VTX_NUM_SLOTS]; +simdvector attrib[SWR_VTX_NUM_SLOTS]; }; +#if ENABLE_AVX512_SIMD16 +struct simd16vertex +{ +simd16vectorattrib[SWR_VTX_NUM_SLOTS]; +}; + +#endif // /// SWR_VS_CONTEXT /// @brief Input to vertex shader -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/24] swr/rast: add renderTargetArrayIndex to SWR_PS_CONTEXT
--- src/gallium/drivers/swr/rasterizer/core/backend.h | 1 + src/gallium/drivers/swr/rasterizer/core/state.h | 10 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.h b/src/gallium/drivers/swr/rasterizer/core/backend.h index 7bb1f55..dba5041 100644 --- a/src/gallium/drivers/swr/rasterizer/core/backend.h +++ b/src/gallium/drivers/swr/rasterizer/core/backend.h @@ -499,6 +499,7 @@ void SetupPixelShaderContext(SWR_PS_CONTEXT *psContext, const SWR_MULTISAMPLE_PO psContext->pPerspAttribs = work.pPerspAttribs; psContext->frontFace = work.triFlags.frontFacing; psContext->primID = work.triFlags.primID; +psContext->renderTargetArrayIndex = work.triFlags.renderTargetArrayIndex; // save Ia/Ib/Ic and Ja/Jb/Jc if we need to reevaluate i/j/k in the shader because of pull attribs psContext->I = work.I; diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h b/src/gallium/drivers/swr/rasterizer/core/state.h index 8812fba..75d1210 100644 --- a/src/gallium/drivers/swr/rasterizer/core/state.h +++ b/src/gallium/drivers/swr/rasterizer/core/state.h @@ -342,11 +342,11 @@ struct SWR_PS_CONTEXT simdvector shaded[SWR_NUM_RENDERTARGETS]; // OUT: result color per rendertarget -uint32_t frontFace; // IN: front- 1, back- 0 -uint32_t primID;// IN: primitive ID -uint32_t sampleIndex; // IN: sampleIndex - -uint32_t rasterizerSampleCount; // IN: sample count used by the rasterizer +uint32_t frontFace; // IN: front- 1, back- 0 +uint32_t primID;// IN: primitive ID +uint32_t sampleIndex; // IN: sampleIndex +uint32_t renderTargetArrayIndex;// IN: render target array index from GS +uint32_t rasterizerSampleCount; // IN: sample count used by the rasterizer uint8_t* pColorBuffer[SWR_NUM_RENDERTARGETS]; // IN: Pointers to render target hottiles }; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/24] swr/rast: make simd16 logicops avx512f safe
Express the simd16 logicops in terms of avx512f instructions. --- src/gallium/drivers/swr/rasterizer/common/simd16intrin.h | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h index 2fe18f2..84585ff 100644 --- a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h +++ b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h @@ -941,10 +941,16 @@ INLINE simd16scalar SIMDAPI _simd16_cmp_ps_temp(simd16scalar a, simd16scalar b) #define _simd16_castpd_ps _mm512_castpd_ps #define _simd16_castps_pd _mm512_castps_pd -#define _simd16_and_ps _mm512_and_ps -#define _simd16_andnot_ps _mm512_andnot_ps -#define _simd16_or_ps _mm512_or_ps -#define _simd16_xor_ps _mm512_xor_ps +// _mm512_and_ps (and other bitwise operations) exist in AVX512DQ, +// while the functionally equivalent _mm512_and_epi32 is in AVX512F. +// Define the _simd16_*_ps versions in terms of AVX512F for broader +// support. +#define _simd16_logicop_ps(a, b, op) _simd16_castsi_ps(op##_epi32(_simd16_castps_si(a), _simd16_castps_si(b))) + +#define _simd16_and_ps(a, b)_simd16_logicop_ps(a, b, _mm512_and) +#define _simd16_andnot_ps(a, b) _simd16_logicop_ps(a, b, _mm512_andnot) +#define _simd16_or_ps(a, b) _simd16_logicop_ps(a, b, _mm512_or) +#define _simd16_xor_ps(a, b)_simd16_logicop_ps(a, b, _mm512_xor) template INLINE simd16scalar SIMDAPI _simd16_round_ps_temp(simd16scalar a) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/24] swr/rast: add CreateDirectoryPath to recursively create directories
--- src/gallium/drivers/swr/rasterizer/common/os.cpp | 48 +- src/gallium/drivers/swr/rasterizer/common/os.h | 3 +- .../drivers/swr/rasterizer/jitter/JitManager.cpp | 10 ++--- 3 files changed, 53 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/common/os.cpp b/src/gallium/drivers/swr/rasterizer/common/os.cpp index 295556a..27ad5e9 100644 --- a/src/gallium/drivers/swr/rasterizer/common/os.cpp +++ b/src/gallium/drivers/swr/rasterizer/common/os.cpp @@ -22,8 +22,14 @@ / #include "common/os.h" +#include +#include -#if defined(FORCE_LINUX) || defined(__linux__) || defined(__gnu_linux__) +#if defined(_WIN32) +#include +#endif // Windows + +#if defined(__APPLE__) || defined(FORCE_LINUX) || defined(__linux__) || defined(__gnu_linux__) #include #endif // Linux @@ -105,3 +111,43 @@ void SWR_API SetCurrentThreadName(const char* pThreadName) pthread_setname_np(pthread_self(), pThreadName); #endif // Linux } + +static void SplitString(std::vector& out_segments, const std::string& input, char splitToken) +{ +out_segments.clear(); + +std::istringstream f(input); +std::string s; +while (std::getline(f, s, splitToken)) +{ +if (s.size()) +{ +out_segments.push_back(s); +} +} +} + +void SWR_API CreateDirectoryPath(const std::string& path) +{ +#if defined(_WIN32) +SHCreateDirectoryExA(nullptr, path.c_str(), nullptr); +#endif // Windows + +#if defined(__APPLE__) || defined(FORCE_LINUX) || defined(__linux__) || defined(__gnu_linux__) +std::vector pathSegments; +SplitString(pathSegments, path, '/'); + +std::string tmpPath; +for (auto const& segment : pathSegments) +{ +tmpPath.push_back('/'); +tmpPath += segment; + +int result = mkdir(tmpPath.c_str(), 0777); +if (result == -1 && errno != EEXIST) +{ +break; +} +} +#endif // Unix +} diff --git a/src/gallium/drivers/swr/rasterizer/common/os.h b/src/gallium/drivers/swr/rasterizer/common/os.h index f9b6cca..6e4d98f 100644 --- a/src/gallium/drivers/swr/rasterizer/common/os.h +++ b/src/gallium/drivers/swr/rasterizer/common/os.h @@ -234,8 +234,6 @@ void AlignedFree(void* p) pid_t gettid(void); #define GetCurrentThreadId gettid -#define CreateDirectory(name, pSecurity) mkdir(name, 0777) - #define InterlockedCompareExchange(Dest, Exchange, Comparand) __sync_val_compare_and_swap(Dest, Comparand, Exchange) #define InterlockedExchangeAdd(Addend, Value) __sync_fetch_and_add(Addend, Value) #define InterlockedDecrement(Append) __sync_sub_and_fetch(Append, 1) @@ -281,5 +279,6 @@ typedef MEGABYTEGIGABYTE[1024]; // Defined in os.cpp void SWR_API SetCurrentThreadName(const char* pThreadName); +void SWR_API CreateDirectoryPath(const std::string& path); #endif//__SWR_OS_H__ diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp index 5d8ad27..2009db0 100644 --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp @@ -159,9 +159,9 @@ JitManager::JitManager(uint32_t simdWidth, const char *arch, const char* core) #if defined(_WIN32) if (KNOB_DUMP_SHADER_IR) { -CreateDirectory(INTEL_OUTPUT_DIR, NULL); -CreateDirectory(SWR_OUTPUT_DIR, NULL); -CreateDirectory(JITTER_OUTPUT_DIR, NULL); +CreateDirectoryPath(INTEL_OUTPUT_DIR); +CreateDirectoryPath(SWR_OUTPUT_DIR); +CreateDirectoryPath(JITTER_OUTPUT_DIR); } #endif } @@ -204,7 +204,7 @@ void JitManager::DumpAsm(Function* pFunction, const char* fileName) const char* pBaseName = strrchr(procname, '\\'); std::stringstream outDir; outDir << JITTER_OUTPUT_DIR << pBaseName << "_" << pid << std::ends; -CreateDirectory(outDir.str().c_str(), NULL); +CreateDirectoryPath(outDir.str().c_str()); #endif std::error_code EC; @@ -242,7 +242,7 @@ void JitManager::DumpToFile(Function *f, const char *fileName) const char* pBaseName = strrchr(procname, '\\'); std::stringstream outDir; outDir << JITTER_OUTPUT_DIR << pBaseName << "_" << pid << std::ends; -CreateDirectory(outDir.str().c_str(), NULL); +CreateDirectoryPath(outDir.str().c_str()); #endif std::error_code EC; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 16/24] swr/rast: SIMD16 FE - fix/use SIMD16 calcDeterminantIntVertical()
Stop double pumping the SIMD8 version. --- .../drivers/swr/rasterizer/common/simd16intrin.h | 22 .../drivers/swr/rasterizer/common/simdintrin.h | 24 + src/gallium/drivers/swr/rasterizer/core/frontend.h | 62 +++--- 3 files changed, 65 insertions(+), 43 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h index 84585ff..e303ce5 100644 --- a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h +++ b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h @@ -770,6 +770,26 @@ INLINE simd16scalari SIMDAPI _simd16_cvtepu16_epi32(simdscalari a) return result; } +INLINE simd16scalari SIMDAPI _simd16_cvtepu16_epi64(simdscalari a) +{ +simd16scalari result; + +result.lo = _simd_cvtepu16_epi64(_mm256_extractf128_si256(a, 0)); +result.hi = _simd_cvtepu16_epi64(_mm256_extractf128_si256(a, 1)); + +return result; +} + +INLINE simd16scalari SIMDAPI _simd16_cvtepu32_epi64(simdscalari a) +{ +simd16scalari result; + +result.lo = _simd_cvtepu32_epi64(_mm256_extractf128_si256(a, 0)); +result.hi = _simd_cvtepu32_epi64(_mm256_extractf128_si256(a, 1)); + +return result; +} + SIMD16_EMU_AVX512_2(simd16scalari, _simd16_packus_epi16, _simd_packus_epi16) SIMD16_EMU_AVX512_2(simd16scalari, _simd16_packs_epi16, _simd_packs_epi16) SIMD16_EMU_AVX512_2(simd16scalari, _simd16_packus_epi32, _simd_packus_epi32) @@ -1097,6 +1117,8 @@ INLINE simd16scalari SIMDAPI _simd16_cmpgt_epi8(simd16scalari a, simd16scalari b #define _simd16_cvtepu8_epi16 _mm512_cvtepu8_epi16 #define _simd16_cvtepu8_epi32 _mm512_cvtepu8_epi32 #define _simd16_cvtepu16_epi32 _mm512_cvtepu16_epi32 +#define _simd16_cvtepu16_epi64 _mm512_cvtepu16_epi64 +#define _simd16_cvtepu32_epi64 _mm512_cvtepu32_epi64 #define _simd16_packus_epi16_mm512_packus_epi16 #define _simd16_packs_epi16 _mm512_packs_epi16 #define _simd16_packus_epi32_mm512_packus_epi32 diff --git a/src/gallium/drivers/swr/rasterizer/common/simdintrin.h b/src/gallium/drivers/swr/rasterizer/common/simdintrin.h index 61c0c54..ed6e56b 100644 --- a/src/gallium/drivers/swr/rasterizer/common/simdintrin.h +++ b/src/gallium/drivers/swr/rasterizer/common/simdintrin.h @@ -456,6 +456,28 @@ __m256i _simd_cvtepu16_epi32(__m128i a) } INLINE +__m256i _simd_cvtepu16_epi64(__m128i a) +{ +__m128i resultlo = _mm_cvtepu16_epi64(a); +__m128i resulthi = _mm_cvtepu16_epi64(_mm_srli_si128(a, 4)); + +__m256i result = _mm256_castsi128_si256(resultlo); + +return _mm256_insertf128_si256(result, resulthi, 1); +} + +INLINE +__m256i _simd_cvtepu32_epi64(__m128i a) +{ +__m128i resultlo = _mm_cvtepu32_epi64(a); +__m128i resulthi = _mm_cvtepu32_epi64(_mm_srli_si128(a, 8)); + +__m256i result = _mm256_castsi128_si256(resultlo); + +return _mm256_insertf128_si256(result, resulthi, 1); +} + +INLINE __m256i _simd_packus_epi16(__m256i a, __m256i b) { __m128i alo = _mm256_extractf128_si256(a, 0); @@ -582,6 +604,8 @@ __m256i _simd_packs_epi32(__m256i a, __m256i b) #define _simd_cvtepu8_epi16 _mm256_cvtepu8_epi16 #define _simd_cvtepu8_epi32 _mm256_cvtepu8_epi32 #define _simd_cvtepu16_epi32 _mm256_cvtepu16_epi32 +#define _simd_cvtepu16_epi64 _mm256_cvtepu16_epi64 +#define _simd_cvtepu32_epi64 _mm256_cvtepu32_epi64 #define _simd_packus_epi16 _mm256_packus_epi16 #define _simd_packs_epi16 _mm256_packs_epi16 #define _simd_packus_epi32 _mm256_packus_epi32 diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.h b/src/gallium/drivers/swr/rasterizer/core/frontend.h index eedbcfc..9f347e1 100644 --- a/src/gallium/drivers/swr/rasterizer/core/frontend.h +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.h @@ -162,6 +162,7 @@ INLINE void calcDeterminantIntVertical(const simdscalari vA[3], const simdscalari vB[3], simdscalari *pvDet) { // refer to calcDeterminantInt comment for calculation explanation + // A1*B2 simdscalari vA1Lo = _simd_unpacklo_epi32(vA[1], vA[1]); // 0 0 1 1 4 4 5 5 simdscalari vA1Hi = _simd_unpackhi_epi32(vA[1], vA[1]); // 2 2 3 3 6 6 7 7 @@ -186,8 +187,10 @@ void calcDeterminantIntVertical(const simdscalari vA[3], const simdscalari vB[3] simdscalari detLo = _simd_sub_epi64(vA1B2Lo, vA2B1Lo); simdscalari detHi = _simd_sub_epi64(vA1B2Hi, vA2B1Hi); -// shuffle 0 1 4 5 -> 0 1 2 3 +// shuffle 0 1 4 5 2 3 6 7 -> 0 1 2 3 simdscalari vResultLo = _simd_permute2f128_si(detLo, detHi, 0x20); + +// shuffle 0 1 4 5 2 3 6 7 -> 4 5 6 7 simdscalari vResultHi = _simd_permute2f128_si(detLo, detHi, 0x31); pvDet[0] = vResultLo; @@ -199,57 +202,30 @@ INLINE void calcDeterminantIntVertical(const simd16scalari vA[3], const simd16scalari vB[3], simd16scalari *pvDet) { // refer to calcDeterminantInt comment for calculation explanation -// A1*B2 - -#if 1 -// TOD
[Mesa-dev] [PATCH 09/24] swr/rast: SIMD16 FE - interleaved simdvertex output in GS
Eliminates conversion copies on GS output from simdvertex to simd16vertex. --- .../drivers/swr/rasterizer/core/frontend.cpp | 22 src/gallium/drivers/swr/swr_shader.cpp | 29 +++--- 2 files changed, 31 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp index 3886c64..e88246f 100644 --- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp @@ -717,10 +717,6 @@ void ProcessStreamIdBuffer(uint32_t stream, uint8_t* pStreamIdBase, uint32_t num THREAD SWR_GS_CONTEXT tlsGsContext; -#if USE_SIMD16_FRONTEND -THREAD simd16vertex tempVertex_simd16[128]; - -#endif template struct GsBufferInfo { @@ -819,7 +815,11 @@ static void GeometryShaderStage( tlsGsContext.vert[i].attrib[VERTEX_POSITION_SLOT] = attrib[i]; } +#if USE_SIMD16_FRONTEND +const GsBufferInfo bufferInfo(state.gsState); +#else const GsBufferInfo bufferInfo(state.gsState); +#endif // record valid prims from the frontend to avoid over binning the newly generated // prims from the GS @@ -923,19 +923,7 @@ static void GeometryShaderStage( } #if USE_SIMD16_FRONTEND -// TEMPORARY: GS outputs simdvertex, PA inputs simd16vertex, so convert simdvertex to simd16vertex - -SWR_ASSERT(numEmittedVerts <= 256); - -PackPairsOfSimdVertexIntoSimd16Vertex( -tempVertex_simd16, -reinterpret_cast(pBase), -numEmittedVerts, -SWR_VTX_NUM_SLOTS); - -#endif -#if USE_SIMD16_FRONTEND -PA_STATE_CUT gsPa(pDC, reinterpret_cast(tempVertex_simd16), numEmittedVerts, reinterpret_cast(pCutBuffer), numEmittedVerts, numAttribs, pState->outputTopology, processCutVerts); +PA_STATE_CUT gsPa(pDC, pBase, numEmittedVerts, reinterpret_cast(pCutBuffer), numEmittedVerts, numAttribs, pState->outputTopology, processCutVerts); #else PA_STATE_CUT gsPa(pDC, pBase, numEmittedVerts, pCutBuffer, numEmittedVerts, numAttribs, pState->outputTopology, processCutVerts); diff --git a/src/gallium/drivers/swr/swr_shader.cpp b/src/gallium/drivers/swr/swr_shader.cpp index d55820e..2f495f5 100644 --- a/src/gallium/drivers/swr/swr_shader.cpp +++ b/src/gallium/drivers/swr/swr_shader.cpp @@ -370,8 +370,13 @@ BuilderSWR::swr_gs_llvm_emit_vertex(const struct lp_build_tgsi_gs_iface *gs_base IRB()->SetInsertPoint(unwrap(LLVMGetInsertBlock(gallivm->builder))); +#if USE_SIMD16_FRONTEND +const uint32_t simdVertexStride = sizeof(simdvertex) * 2; +const uint32_t numSimdBatches = (pGS->maxNumVerts + (mVWidth * 2) - 1) / (mVWidth * 2); +#else const uint32_t simdVertexStride = sizeof(simdvertex); -const uint32_t numSimdBatches = (pGS->maxNumVerts + 7) / 8; +const uint32_t numSimdBatches = (pGS->maxNumVerts + mVWidth - 1) / mVWidth; +#endif const uint32_t inputPrimStride = numSimdBatches * simdVertexStride; Value *pStream = LOAD(iface->pGsCtx, { 0, SWR_GS_CONTEXT_pStream }); @@ -388,8 +393,14 @@ BuilderSWR::swr_gs_llvm_emit_vertex(const struct lp_build_tgsi_gs_iface *gs_base inputPrimStride * 6, inputPrimStride * 7 } ); -Value *vVertexSlot = ASHR(unwrap(emitted_vertices_vec), 3); -Value *vSimdSlot = AND(unwrap(emitted_vertices_vec), 7); +#if USE_SIMD16_FRONTEND +const uint32_t simdShift = log2(mVWidth * 2); +Value *vSimdSlot = AND(unwrap(emitted_vertices_vec), (mVWidth * 2) - 1); +#else +const uint32_t simdShift = log2(mVWidth); +Value *vSimdSlot = AND(unwrap(emitted_vertices_vec), mVWidth - 1); +#endif +Value *vVertexSlot = ASHR(unwrap(emitted_vertices_vec), simdShift); for (uint32_t attrib = 0; attrib < iface->num_outputs; ++attrib) { uint32_t attribSlot = attrib; @@ -400,10 +411,17 @@ BuilderSWR::swr_gs_llvm_emit_vertex(const struct lp_build_tgsi_gs_iface *gs_base else if (iface->info->output_semantic_name[attrib] == TGSI_SEMANTIC_LAYER) attribSlot = VERTEX_RTAI_SLOT; +#if USE_SIMD16_FRONTEND + Value *vOffsetsAttrib = + ADD(vOffsets, MUL(vVertexSlot, VIMMED1((uint32_t)sizeof(simdvertex) * 2))); + vOffsetsAttrib = + ADD(vOffsetsAttrib, VIMMED1((uint32_t)(attribSlot*sizeof(simdvector) * 2))); +#else Value *vOffsetsAttrib = ADD(vOffsets, MUL(vVertexSlot, VIMMED1((uint32_t)sizeof(simdvertex; vOffsetsAttrib = ADD(vOffsetsAttrib, VIMMED1((uint32_t)(attribSlot*sizeof(simdvector; +#endif vOffsetsAttrib = ADD(vOffsetsAttrib, MUL(vSimdSlot, VIMMED1((uint32_t)sizeof(float; @@ -416,8 +434,13 @@ BuilderSWR::swr_gs_llvm_emit_vertex(const struct lp_build_tgsi_gs_iface *gs_base MASKED_SCATTER(vData, vPtrs, 32, vMask1); +#if USE_SIMD16_FRON
[Mesa-dev] [PATCH 12/24] swr/rast: SIMD16 FE - fix PA_STATE_OP::Reset()
Fixes instanced GS. --- src/gallium/drivers/swr/rasterizer/core/pa.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/swr/rasterizer/core/pa.h b/src/gallium/drivers/swr/rasterizer/core/pa.h index 7c39056..020399d 100644 --- a/src/gallium/drivers/swr/rasterizer/core/pa.h +++ b/src/gallium/drivers/swr/rasterizer/core/pa.h @@ -325,6 +325,9 @@ struct PA_STATE_OPT : public PA_STATE #endif this->pfnPaFunc = this->pfnPaFuncReset; +#if ENABLE_AVX512_SIMD16 +this->pfnPaFunc_simd16 = this->pfnPaFuncReset_simd16; +#endif this->numPrimsComplete = 0; this->numSimdPrims = 0; this->cur = 0; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 17/24] swr/rast: move binner utility functions to binner.h
--- src/gallium/drivers/swr/Makefile.sources | 1 + src/gallium/drivers/swr/rasterizer/core/binner.cpp | 194 +- src/gallium/drivers/swr/rasterizer/core/binner.h | 223 + 3 files changed, 225 insertions(+), 193 deletions(-) create mode 100644 src/gallium/drivers/swr/rasterizer/core/binner.h diff --git a/src/gallium/drivers/swr/Makefile.sources b/src/gallium/drivers/swr/Makefile.sources index 056449c..6b76bd1 100644 --- a/src/gallium/drivers/swr/Makefile.sources +++ b/src/gallium/drivers/swr/Makefile.sources @@ -74,6 +74,7 @@ CORE_CXX_SOURCES := \ rasterizer/core/backend.cpp \ rasterizer/core/backend.h \ rasterizer/core/binner.cpp \ + rasterizer/core/binner.h \ rasterizer/core/blend.h \ rasterizer/core/clip.cpp \ rasterizer/core/clip.h \ diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index 89a2167..a3a3288 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -26,6 +26,7 @@ * **/ +#include "binner.h" #include "context.h" #include "frontend.h" #include "conservativeRast.h" @@ -44,199 +45,6 @@ void BinPostSetupPoints_simd16(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerI #endif // -/// @brief Offsets added to post-viewport vertex positions based on -/// raster state. -static const simdscalar g_pixelOffsets[SWR_PIXEL_LOCATION_UL + 1] = -{ -_simd_set1_ps(0.0f),// SWR_PIXEL_LOCATION_CENTER -_simd_set1_ps(0.5f),// SWR_PIXEL_LOCATION_UL -}; - -#if USE_SIMD16_FRONTEND -static const simd16scalar g_pixelOffsets_simd16[SWR_PIXEL_LOCATION_UL + 1] = -{ -_simd16_set1_ps(0.0f), // SWR_PIXEL_LOCATION_CENTER -_simd16_set1_ps(0.5f), // SWR_PIXEL_LOCATION_UL -}; - -#endif -// -/// @brief Convert the X,Y coords of a triangle to the requested Fixed -/// Point precision from FP32. -template > -INLINE simdscalari fpToFixedPointVertical(const simdscalar vIn) -{ -simdscalar vFixed = _simd_mul_ps(vIn, _simd_set1_ps(PT::ScaleT::value)); -return _simd_cvtps_epi32(vFixed); -} - -#if USE_SIMD16_FRONTEND -template > -INLINE simd16scalari fpToFixedPointVertical(const simd16scalar vIn) -{ -simd16scalar vFixed = _simd16_mul_ps(vIn, _simd16_set1_ps(PT::ScaleT::value)); -return _simd16_cvtps_epi32(vFixed); -} - -#endif -// -/// @brief Helper function to set the X,Y coords of a triangle to the -/// requested Fixed Point precision from FP32. -/// @param tri: simdvector[3] of FP triangle verts -/// @param vXi: fixed point X coords of tri verts -/// @param vYi: fixed point Y coords of tri verts -INLINE static void FPToFixedPoint(const simdvector * const tri, simdscalari(&vXi)[3], simdscalari(&vYi)[3]) -{ -vXi[0] = fpToFixedPointVertical(tri[0].x); -vYi[0] = fpToFixedPointVertical(tri[0].y); -vXi[1] = fpToFixedPointVertical(tri[1].x); -vYi[1] = fpToFixedPointVertical(tri[1].y); -vXi[2] = fpToFixedPointVertical(tri[2].x); -vYi[2] = fpToFixedPointVertical(tri[2].y); -} - -#if USE_SIMD16_FRONTEND -INLINE static void FPToFixedPoint(const simd16vector * const tri, simd16scalari(&vXi)[3], simd16scalari(&vYi)[3]) -{ -vXi[0] = fpToFixedPointVertical(tri[0].x); -vYi[0] = fpToFixedPointVertical(tri[0].y); -vXi[1] = fpToFixedPointVertical(tri[1].x); -vYi[1] = fpToFixedPointVertical(tri[1].y); -vXi[2] = fpToFixedPointVertical(tri[2].x); -vYi[2] = fpToFixedPointVertical(tri[2].y); -} - -#endif -// -/// @brief Calculate bounding box for current triangle -/// @tparam CT: ConservativeRastFETraits type -/// @param vX: fixed point X position for triangle verts -/// @param vY: fixed point Y position for triangle verts -/// @param bbox: fixed point bbox -/// *Note*: expects vX, vY to be in the correct precision for the type -/// of rasterization. This avoids unnecessary FP->fixed conversions. -template -INLINE void calcBoundingBoxIntVertical(const simdvector * const tri, simdscalari(&vX)[3], simdscalari(&vY)[3], simdBBox &bbox) -{ -simdscalari vMinX = vX[0]; -vMinX = _simd_min_epi32(vMinX, vX[1]); -vMinX = _simd_min_epi32(vMinX, vX[2]); - -simdscalari vMaxX = vX[0]; -vMaxX = _simd_max_epi32(vMaxX, vX[1]); -vMaxX = _simd_max_epi32(vMaxX, vX[2]); - -simdscalari vMinY = vY[0]; -vMinY = _simd_min_epi32(vMinY, vY[1]); -vMinY = _simd_min_epi32(vMinY, vY[2]); - -simdscalari vMaxY = vY[0]; -vMaxY = _simd_max_epi32(vMaxY, vY[1]); -vMaxY = _simd_max_epi32(vMaxY, vY[2]); - -bbox.xmin = vMinX; -bbox.xmax = vMaxX;
[Mesa-dev] [PATCH 06/24] swr/rast: silence write of cfg graph
--- src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp index 2009db0..49b06f7 100644 --- a/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp +++ b/src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp @@ -264,10 +264,10 @@ void JitManager::DumpToFile(Function *f, const char *fileName) #endif fd.flush(); -raw_fd_ostream fd_cfg(fName, EC, llvm::sys::fs::F_Text); -WriteGraph(fd_cfg, (const Function*)f); +//raw_fd_ostream fd_cfg(fName, EC, llvm::sys::fs::F_Text); +//WriteGraph(fd_cfg, (const Function*)f); -fd_cfg.flush(); +//fd_cfg.flush(); } } -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 20/24] swr/rast: move wireframe/point triangle binning after culling
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 156 ++--- 1 file changed, 76 insertions(+), 80 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index 4667b48..b3fe4cf 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -561,45 +561,6 @@ void BinTriangles( RDTSC_EVENT(FECullZeroAreaAndBackface, _mm_popcnt_u32(origTriMask ^ triMask), 0); } -{ -// Simple non-conformant wireframe mode, useful for debugging -if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) -{ -// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD -simdvector line[2]; -simdscalar recipW[2]; -line[0] = tri[0]; -line[1] = tri[1]; -recipW[0] = vRecipW0; -recipW[1] = vRecipW1; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[1]; -line[1] = tri[2]; -recipW[0] = vRecipW1; -recipW[1] = vRecipW2; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[2]; -line[1] = tri[0]; -recipW[0] = vRecipW2; -recipW[1] = vRecipW0; -BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -AR_END(FEBinTriangles, 1); -return; -} -else if (rastState.fillMode == SWR_FILLMODE_POINT) -{ -// bin 3 points - -BinPostSetupPoints(pDC, pa, workerId, &tri[0], triMask, primID, viewportIdx); -BinPostSetupPoints(pDC, pa, workerId, &tri[1], triMask, primID, viewportIdx); -BinPostSetupPoints(pDC, pa, workerId, &tri[2], triMask, primID, viewportIdx); -return; -} -} - /// Note: these variable initializations must stay above any 'goto endBenTriangles' // compute per tri backface uint32_t frontFaceMask = frontWindingTris; @@ -737,9 +698,43 @@ void BinTriangles( triMask = triMask & ~maskOutsideScissor; } -if (!triMask) +endBinTriangles: + +// Send surviving triangles to the line or point binner based on fill mode +if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) { -goto endBinTriangles; +// Simple non-conformant wireframe mode, useful for debugging. +// Construct 3 SIMD lines out of the triangle and call the line binner for each SIMD +simdvector line[2]; +simdscalar recipW[2]; +line[0] = tri[0]; +line[1] = tri[1]; +recipW[0] = vRecipW0; +recipW[1] = vRecipW1; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +line[0] = tri[1]; +line[1] = tri[2]; +recipW[0] = vRecipW1; +recipW[1] = vRecipW2; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +line[0] = tri[2]; +line[1] = tri[0]; +recipW[0] = vRecipW2; +recipW[1] = vRecipW0; +BinPostSetupLines(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); + +AR_END(FEBinTriangles, 1); +return; +} +else if (rastState.fillMode == SWR_FILLMODE_POINT) +{ +// Bin 3 points +BinPostSetupPoints(pDC, pa, workerId, &tri[0], triMask, primID, viewportIdx); +BinPostSetupPoints(pDC, pa, workerId, &tri[1], triMask, primID, viewportIdx); +BinPostSetupPoints(pDC, pa, workerId, &tri[2], triMask, primID, viewportIdx); +return; } // Convert triangle bbox to macrotile units. @@ -777,8 +772,6 @@ void BinTriangles( _simd_store_si((simdscalari*)aRTAI, _simd_setzero_si()); } -endBinTriangles: - // scan remaining valid triangles and bin each separately while (_BitScanForward(&triIndex, triMask)) { @@ -997,36 +990,6 @@ void SIMDAPI BinTriangles_simd16( RDTSC_EVENT(FECullZeroAreaAndBackface, _mm_popcnt_u32(origTriMask ^ triMask), 0); } -{ -// Simple non-conformant wireframe mode, useful for debugging -if (rastState.fillMode == SWR_FILLMODE_WIREFRAME) -{ -// construct 3 SIMD lines out of the triangle and call the line binner for each SIMD -simd16vector line[2]; -simd16scalar recipW[2]; -line[0] = tri[0]; -line[1] = tri[1]; -recipW[0] = vRecipW0; -recipW[1] = vRecipW1; -BinPostSetupLines_simd16(pDC, pa, workerId, line, recipW, triMask, primID, viewportIdx); - -line[0] = tri[1]; -line[1] = tri[2]; -recipW[0] = vRecipW1; -recipW[1] = vRecipW2; -BinPostSetupLines_simd16(pD
[Mesa-dev] [PATCH 24/24] swr/rast: code cleanup (no functional change)
--- src/gallium/drivers/swr/rasterizer/core/binner.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp b/src/gallium/drivers/swr/rasterizer/core/binner.cpp index b3fe4cf..daadd5f 100644 --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp @@ -674,10 +674,14 @@ void BinTriangles( scisYmax = _simd_set1_epi32(state.scissorsInFixedPoint[0].ymax); } +// Make triangle bbox inclusive +bbox.xmax = _simd_sub_epi32(bbox.xmax, _simd_set1_epi32(1)); +bbox.ymax = _simd_sub_epi32(bbox.ymax, _simd_set1_epi32(1)); + bbox.xmin = _simd_max_epi32(bbox.xmin, scisXmin); bbox.ymin = _simd_max_epi32(bbox.ymin, scisYmin); -bbox.xmax = _simd_min_epi32(_simd_sub_epi32(bbox.xmax, _simd_set1_epi32(1)), scisXmax); -bbox.ymax = _simd_min_epi32(_simd_sub_epi32(bbox.ymax, _simd_set1_epi32(1)), scisYmax); +bbox.xmax = _simd_min_epi32(bbox.xmax, scisXmax); +bbox.ymax = _simd_min_epi32(bbox.ymax, scisYmax); if (CT::IsConservativeT::value) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 21/24] swr/rast: allow early-z if shader uses depth value
--- src/gallium/drivers/swr/rasterizer/core/api.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp b/src/gallium/drivers/swr/rasterizer/core/api.cpp index 1d581ac..a463790 100644 --- a/src/gallium/drivers/swr/rasterizer/core/api.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp @@ -804,7 +804,7 @@ void SetupPipeline(DRAW_CONTEXT *pDC) const uint32_t forcedSampleCount = (rastState.forcedSampleCount) ? 1 : 0; const bool bMultisampleEnable = ((rastState.sampleCount > SWR_MULTISAMPLE_1X) || forcedSampleCount) ? 1 : 0; const uint32_t centroid = ((psState.barycentricsMask & SWR_BARYCENTRIC_CENTROID_MASK) > 0) ? 1 : 0; -const uint32_t canEarlyZ = (psState.forceEarlyZ || (!psState.writesODepth && !psState.usesSourceDepth && !psState.usesUAV)) ? 1 : 0; +const uint32_t canEarlyZ = (psState.forceEarlyZ || (!psState.writesODepth && !psState.usesUAV)) ? 1 : 0; SWR_BARYCENTRICS_MASK barycentricsMask = (SWR_BARYCENTRICS_MASK)psState.barycentricsMask; // select backend function -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 23/24] swr/rast: whitespace changes
--- src/gallium/drivers/swr/rasterizer/core/api.cpp | 8 ++-- src/gallium/drivers/swr/rasterizer/core/context.h | 2 -- src/gallium/drivers/swr/rasterizer/core/state.h | 1 + 3 files changed, 3 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp b/src/gallium/drivers/swr/rasterizer/core/api.cpp index a463790..5dd4dc3 100644 --- a/src/gallium/drivers/swr/rasterizer/core/api.cpp +++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp @@ -329,7 +329,6 @@ DRAW_CONTEXT* GetDrawContext(SWR_CONTEXT *pContext, bool isSplitDraw = false) pCurDrawContext->drawId = pContext->dcRing.GetHead(); pCurDrawContext->cleanupState = true; - } else { @@ -783,10 +782,12 @@ void SetupMacroTileScissors(DRAW_CONTEXT *pDC) } } + // templated backend function tables extern PFN_BACKEND_FUNC gBackendNullPs[SWR_MULTISAMPLE_TYPE_COUNT]; extern PFN_BACKEND_FUNC gBackendSingleSample[SWR_INPUT_COVERAGE_COUNT][2][2]; extern PFN_BACKEND_FUNC gBackendSampleRateTable[SWR_MULTISAMPLE_TYPE_COUNT][SWR_INPUT_COVERAGE_COUNT][2][2]; + void SetupPipeline(DRAW_CONTEXT *pDC) { DRAW_STATE* pState = pDC->pState; @@ -1133,7 +1134,6 @@ void DrawInstanced( pState->rastState.cullMode = SWR_CULLMODE_NONE; } - int draw = 0; while (remainingVerts) { @@ -1174,7 +1174,6 @@ void DrawInstanced( pDC = GetDrawContext(pContext); pDC->pState->state.rastState.cullMode = oldCullMode; - AR_API_END(APIDraw, numVertices * numInstances); } @@ -1276,7 +1275,6 @@ void DrawIndexedInstance( pState->rastState.cullMode = SWR_CULLMODE_NONE; } - while (remainingIndices) { uint32_t numIndicesForDraw = (remainingIndices < maxIndicesPerDraw) ? @@ -1322,7 +1320,6 @@ void DrawIndexedInstance( pDC = GetDrawContext(pContext); pDC->pState->state.rastState.cullMode = oldCullMode; - AR_API_END(APIDrawIndexed, numIndices * numInstances); } @@ -1657,7 +1654,6 @@ void SwrInit() InitBackendFuncTables(); } - void SwrGetInterface(SWR_INTERFACE &out_funcs) { out_funcs.pfnSwrCreateContext = SwrCreateContext; diff --git a/src/gallium/drivers/swr/rasterizer/core/context.h b/src/gallium/drivers/swr/rasterizer/core/context.h index 7781fea..62332db 100644 --- a/src/gallium/drivers/swr/rasterizer/core/context.h +++ b/src/gallium/drivers/swr/rasterizer/core/context.h @@ -418,8 +418,6 @@ struct DRAW_CONTEXT volatile int32_tthreadsDone; SYNC_DESC retireCallback; // Call this func when this DC is retired. - - }; static_assert((sizeof(DRAW_CONTEXT) & 63) == 0, "Invalid size for DRAW_CONTEXT"); diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h b/src/gallium/drivers/swr/rasterizer/core/state.h index 75d1210..364a898 100644 --- a/src/gallium/drivers/swr/rasterizer/core/state.h +++ b/src/gallium/drivers/swr/rasterizer/core/state.h @@ -1046,6 +1046,7 @@ struct SWR_RASTSTATE uint8_t clipDistanceMask; }; + enum SWR_CONSTANT_SOURCE { SWR_CONSTANT_SOURCE_CONST_, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/24] swr/rast: fix _simd16_movemask_(ps, pd) native AVX512 intrinsics
--- src/gallium/drivers/swr/rasterizer/common/simd16intrin.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h index aa47574..2fe18f2 100644 --- a/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h +++ b/src/gallium/drivers/swr/rasterizer/common/simd16intrin.h @@ -539,8 +539,6 @@ INLINE int SIMDAPI _simd16_testz_ps(simd16scalar a, simd16scalar b) return lo & hi; } -#define _simd16_cmplt_epi32(a, b) _simd16_cmpgt_epi32(b, a) - SIMD16_EMU_AVX512_2(simd16scalar, _simd16_unpacklo_ps, _simd_unpacklo_ps) SIMD16_EMU_AVX512_2(simd16scalar, _simd16_unpackhi_ps, _simd_unpackhi_ps) SIMD16_EMU_AVX512_2(simd16scalard, _simd16_unpacklo_pd, _simd_unpacklo_pd) @@ -898,12 +896,14 @@ INLINE simd16scalari SIMDAPI _simd16_blendv_epi32(simd16scalari a, simd16scalari INLINE simd16mask SIMDAPI _simd16_movemask_ps(simd16scalar a) { -return _simd16_scalari2mask(_mm512_castps_si512(a)); +// movemask_ps only checks the top bit of the float single elements +return _simd16_scalari2mask(_mm512_and_si512(_mm512_castps_si512(a), _mm512_set1_epi32(0x8000))); } INLINE simd16mask SIMDAPI _simd16_movemask_pd(simd16scalard a) { -return _simd16_scalard2mask(a); +// movemask_pd only checks the top bit of the float double elements +return _simd16_scalard2mask(_mm512_castsi512_pd(_mm512_and_si512(_mm512_castpd_si512(a), _mm512_set1_epi64(0x8000; } #if 0 -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 101214] xdriinfo and libglvnd Screen 0: not direct rendering capable
https://bugs.freedesktop.org/show_bug.cgi?id=101214 Bug ID: 101214 Summary: xdriinfo and libglvnd Screen 0: not direct rendering capable Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Other Assignee: mesa-dev@lists.freedesktop.org Reporter: lonew...@xs4all.nl QA Contact: mesa-dev@lists.freedesktop.org Created attachment 131543 --> https://bugs.freedesktop.org/attachment.cgi?id=131543&action=edit LD_DEBUG=libs xdriinfo with libglvnd installed Hi, with mesa-git master built against libglvnd , xdriinfo can't detect the dri-driver in use, glxinfo works fine . When the same mesa-git version is built with libglvnd explicitly disabled, xdriinfo works as expected. https://bugzilla.redhat.com/show_bug.cgi?id=1429894 appears to be about the same bug and concludes it's a mesa bug. That bug report also indicates Fedora Core has solved this by patching mesa though i can't find the patch they used. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/30] i965: Overhaul resolves
On Fri, May 26, 2017 at 5:34 PM, Jason Ekstrand wrote: > On Fri, May 26, 2017 at 4:30 PM, Jason Ekstrand > wrote: > >> This patch series does a complete overhaul of the current resolve handling >> framework inside the i916 OpenGL driver. For HiZ and MCS, the current >> resolve code is ok but not optimal. For CCS, however, it's pretty bad. >> I've been looking at the code for a week now and I still don't know how >> Ben >> ever got it to do a partial resolve for his CCS modifiers series. So far >> as I can tell, it's not capable of doing so. The new resolve system is >> hopefully much easier to reason about. For users of the system, there are >> fewer entry-points and depth and color are no longer separate. The guts >> of >> the system are much more explicit and, thanks to the new isl_aux_state >> enum, should be easier to understand. >> > > One more little tidbit. For those of you who want some sort of proof that > the new system is easier to reason about, consider this: While debugging > patch 27, I only had a single rendering corruption (the Sandy Bridge bug > mentioned below); the rest of the bugs were all segfaults or assertion > failures. There were quite a few of those but they're way easier to find > than GPU hangs and corruption. > > >> As of my last Jenkins run, the series is still failing 2 piglit tests on >> Sandy Bridge and I have yet to do any benchmarking. However, I wanted to >> send it out early so that I could get feedback on the structure of the >> system as quickly as possible. Discussion of the structure can happen in >> parallel with final tweaking. Personally, I'm fairly happy with it and I >> think this looks like a good way to go but I'd like more eyes. >> > I did a bit of digging and it urns out the Sandy Bridge bug isn't my fault. :-) It turns out we've been doing sandy bridge HiZ and stencil allocation wrong ever since we enabled layered rendering. I did a bit of digging this morning and I think I now understand gen6 HiZ well enough to fix it but my initial attempt to fix it didn't quite work. Hopefully, I'll have a fix early next week. > The patch series itself is organized as follows >> >> * The first 13 patches are various cleanups which make later patches >>simpler. They should be fairly benign. These can easily land on their >>own as I think most of them are good clean-ups anyway. >> >> * Patch 14 adds the new isl_aux_state enum and the accompanying comment >> >> * Patch 15 adds the new interface for doing resolves. All of the >>functions are just dummies which call the already existing functions. >> >> * Patches 16-26 convert everything over to using the new resolve >>interface. I tried as hard as I could to not make any functional >>changes while doing so. If you see any, they are probably bugs! >> >> * Patch 27 wholesale replaces the current color resolve scheme with a new >>one based on isl_aux_state. It's a bit unfortunate that it all had to >>happen in one go but it's not easy to switch resolve schemes slowly. >> >> * Patch 28 replaces the HiZ resolve framework. This one is not nearly as >>drastic as patch 27 because the current HiZ framework is already pretty >>good. >> >> * Patch 29 deletes the now unused intel_resolve_map struct >> >> * Patch 30 enables fast-clears for non-CCS_E capable surfaces. In >>particular, this gives us fast-clears for sRGB. >> >> I'd appreciate it if the initial review focussed on patches 14, 15, and >> 27. >> Those are where you see the new resolve system in action. >> >> This series is available here: >> >> https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=review/i >> 965-resolve-rework >> >> Happy Reviewing! >> >> --Jason Ekstrand >> >> Cc: Chad Versace >> Cc: Kenneth Graunke >> Cc: Nanley Chery >> Cc: Topi Pohjolainen >> >> Jason Ekstrand (30): >> i965: Mark depth surfaces as needing a HiZ resolve after blitting >> i965/surface_state: Images can't handle CCS at all >> intel/isl: Add a helper for determining if a color is 0/1 >> i965/miptree: Store fast clear colors in an isl_color_value >> i965/miptree: Clean up the depth resolve helpers a little >> i965/miptree: Refactor intel_miptree_resolve_color >> i965: Get rid of intel_renderbuffer_resolve_* >> i965: Inline renderbuffer_att_set_needs_depth_resolve >> i965/miptree: Move color resolve on map to intel_miptree_map >> i965/blorp: Take an explicit fast clear op in resolve_color >> i965/blorp: Refactor do_single_blorp_clear >> i965/blorp: Move MCS allocation earlier for clears >> i965: Combine render target resolve code >> intel/isl: Add an enum for describing auxiliary compression state >> i965/miptree: Add new entrypoints for resolve management >> i965: Use the new resolve function for several simple cases >> i965: Finalize miptrees before prepare_texture >> i965: Move texturing to the new resolve functions >> i965: Move color rendering to the new resolve functions >>
[Mesa-dev] [Bug 101199] nouveau_screen.c: undefined reference to `nouveau_drm_del'
https://bugs.freedesktop.org/show_bug.cgi?id=101199 Andrey changed: What|Removed |Added QA Contact|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop. |rg |org Resolution|NOTOURBUG |--- Status|RESOLVED|REOPENED Assignee|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop. |rg |org Component|Drivers/DRI/nouveau |Other --- Comment #4 from Andrey --- This is an issue with the Mesa build system. The configure script should choose right libdrm from PKG_CONFIG_PATH specified. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev