[Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Rowley, Timothy O
Hi.  I'd like to introduce the Mesa3D community to a software project
that we hope to upstream.  We're a small team at Intel working on
software defined visualization (http://sdvis.org/), and have
opensource projects in both the raytracing (Embree, OSPRay) and
rasterization (OpenSWR) realms.

We're a different Intel team from that of i965 fame, with a different
type of customer and workloads.  Our customers have large clusters of
compute nodes that for various reasons do not have GPUs, and are
working with extremely large geometry models.

We've been working on a high performance, highly scalable rasterizer
and driver to interface with Mesa3D.  Our rasterizer functions as a
"software gpu", relying on the mature well-supported Mesa3D to provide
API and state tracking layers.

We would like to contribute this code to Mesa3D and continue doing
active development in your source repository.  We welcome discussion
about how this will happen and questions about the project itself.
Below are some answers to what we think might be frequently asked
questions.

Bruce and I will be the public contacts for this project, but this
project isn't solely our work - there's a dedicated group of people
working on the core SWR code.

  Tim Rowley
  Bruce Cherniak

  Intel Corporation

Why another software rasterizer?


Good question, given there are already three (swrast, softpipe,
llvmpipe) in the Mesa3D tree. Two important reasons for this:

 * Architecture - given our focus on scientific visualization, our
   workloads are much different than the typical game; we have heavy
   vertex load and relatively simple shaders.  In addition, the core
   counts of machines we run on are much higher.  These parameters led
   to design decisions much different than llvmpipe.

 * Historical - Intel had developed a high performance software
   graphics stack for internal purposes.  Later we adapted this
   graphics stack for use in visualization and decided to move forward
   with Mesa3D to provide a high quality API layer while at the same
   time benefiting from the excellent performance the software
   rasterizerizer gives us.

What's the architecture?


SWR is a tile based immediate mode renderer with a sort-free threading
model which is arranged as a ring of queues.  Each entry in the ring
represents a draw context that contains all of the draw state and work
queues.  An API thread sets up each draw context and worker threads
will execute both the frontend (vertex/geometry processing) and
backend (fragment) work as required.  The ring allows for backend
threads to pull work in order.  Large draws are split into chunks to
allow vertex processing to happen in parallel, with the backend work
pickup preserving draw ordering.

Our pipeline uses just-in-time compiled code for the fetch shader that
does vertex attribute gathering and AOS to SOA conversions, the vertex
shader and fragment shaders, streamout, and fragment blending. SWR
core also supports geometry and compute shaders but we haven't exposed
them through our driver yet. The fetch shader, streamout, and blend is
built internally to swr core using LLVM directly, while for the vertex
and pixel shaders we reuse bits of llvmpipe from
gallium/auxiliary/gallivm to build the kernels, which we wrap
differently than llvmpipe's auxiliary/draw code.

What's the performance?
---

For the types of high-geometry workloads we're interested in, we are
significantly faster than llvmpipe.  This is to be expected, as
llvmpipe only threads the fragment processing and not the geometry
frontend.

The linked slide below shows some performance numbers from a benchmark
dataset and application.  On a 36 total core dual E5-2699v3 we see
performance 29x to 51x that of llvmpipe.  

http://openswr.org/slides/SWR_Sept15.pdf

While our current performance is quite good, we know there is more
potential in this architecture.  When we switched from a prototype
OpenGL driver to Mesa we regressed performance severely, some due to
interface issues that need tuning, some differences in shader code
generation, and some due to conformance and feature additions to the
core swr.  We are looking to recovering most of this performance back.

What's the conformance?
---

The major applications we are targeting are all based on the
Visualization Toolkit (VTK), and as such our development efforts have
been focused on making sure these work as best as possible.  Our
current code passes vtk's rendering tests with their new "OpenGL2"
(really OpenGL 3.2) backend at 99%.

piglit testing shows a much lower pass rate, roughly 80% at the time
of writing.  Core SWR undergoes rigorous unit testing and we are quite
confident in the rasterizer, and understand the areas where it
currently has issues (example: line rendering is done with triangles,
so doesn't match the strict line rendering rules).  The majority of
the piglit failures ar

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Rowley, Timothy O

> On Oct 20, 2015, at 12:44 PM, Ilia Mirkin  wrote:
> 
> On Tue, Oct 20, 2015 at 1:43 PM, Ilia Mirkin  wrote:
>> On Tue, Oct 20, 2015 at 1:11 PM, Rowley, Timothy O
>>  wrote:
>>> Does one build work on both AVX and AVX2?
>>> -
>>> 
>>> * Unfortunately, no.  The architecture support is fixed at compile
>>>   time.  While the AVX version of course will run on AVX2 machines
>>>   and the jitted code will use AVX2, the overall performance will
>>>   suffer relative to a full AVX2 build.
>>> 
>>> * There is some idea that if we move some code from the driver back
>>>   to SWR core, we could build two versions of libSWR and dynamically
>>>   load the correct version at runtime.  Unfortunately this mechanism
>>>   would not work with AVX512, as some of the SWR state structures
>>>   would change size.
>> 
>> Without commenting on any of the other issues, I believe one of your
>> stated goals is to ease distribution to your end-users. If you expect
>> them to build their own code, that's no problem. However if you're
>> thinking of relying on distros to include your driver and have end
>> users use that, then you should consider some solution that enables
>> runtime selection of this stuff (even if that's building 3 versions of
>> the driver -- swr-avx, swr-avx2, swr-avx512, and having e.g. loader
>> magic determine which the right one is for the current CPU).

We’ve found that the large clusters tend to roll their own user environment 
specific to their system configuration, so this problem of binary support 
hasn’t been an immediate concern for the initial users.  We hadn’t considered 
building complete driver/core-swr combinations behind a loader; we’ll consider 
this as a possibility for avx512.

Most of the code movement to make runtime selection at the interface layer 
between core SWR and the driver has been done; we would need to verify any 
stray AVX/AVX2 architecture differences in the driver and add loader logic.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-20 Thread Rowley, Timothy O

> On Oct 20, 2015, at 4:23 PM, Jose Fonseca  wrote:
> 
> I tried it on my i7-5500U, but I run into two issues:
> 
> - OpenSWR seems to only use 2 threads (even though my system support 4 
> threads)
> 
> - and even when I compensate llvmpipe to only use 2 rasterizer threads, I 
> still only get half the framerate of llvmpipe with the "gloss" Mesa demo (a 
> very simple texturing demo):
> 
> $ ./gloss
> SWR create screen!
> This processor supports AVX2.
> 720 frames in 5.004 seconds = 143.885 FPS
> 737 frames in 5.005 seconds = 147.253 FPS
> 729 frames in 5.004 seconds = 145.683 FPS
> 732 frames in 5.002 seconds = 146.341 FPS
> 735 frames in 5.001 seconds = 146.971 FPS
> [...]
> $ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
> 1539 frames in 5.002 seconds = 307.677 FPS
> 1719 frames in 5 seconds = 343.8 FPS
> 1780 frames in 5.002 seconds = 355.858 FPS
> 1497 frames in 5.002 seconds = 299.28 FPS
> 1548 frames in 5.001 seconds = 309.538 FPS
> [..]
> 
> I see similar ratio with more complex  workload with the trace from:
> 
>  http://people.freedesktop.org/~jrfonseca/traces/furmark-1.8.2-svga.trace
> 
> (you'll need to download https://github.com/apitrace/apitrace and build)
> 
> My questions are:
> 
> - Is this the expected performance when texturing is used? Or is there 
> something wrong with my setup?
> 

Two things are happening here to cause the behavior you’re seeing.  First, 
OpenSWR only generates threads equal to the number of physical cores.  On our 
workloads, going beyond that and using hyperthreads was a minimal or negative 
performance increase.  Second, one thread is reserved for the API thread, which 
does not participate in either frontend (geometry) or backend (fragment) work.  
Thus on your two core 5500U OpenSWR only had one raster thread versus 
llvmpipe’s two, giving half the performance.  If you want to switch OpenSWR to 
using hyperthreads, set the environment variable KNOB_MAX_THREADS_PER_CORE=0.

>  I understand that OpenSWR actually leverages llvmpipe (well gallivm's) code 
> for texture sampling, so I was expecting a smaller gap.

Yes, we use gallivm’s texture sampler so our performance should be similar on 
texture-limited workloads.  I tried a quick test of openarena on a 4-core 
machine and the performance delta was about 6% (default N-1 OpenSWR worker 
threads).

> - What exactly was the benchmark used for SWR_Sept15.pdf's figures ? Was 
> there any texture sampling used on it, or was it just simple lighting?

I don’t have the apitrace in front of me, but I believe the turbulence data was 
two-sided lit, with a textured plane.

Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-21 Thread Rowley, Timothy O

> On Oct 20, 2015, at 2:03 PM, Roland Scheidegger  wrote:
> 
> Certainly looks interesting...
> From a high level point of view, seems quite similar to llvmpipe (both
> tile based, using llvm for jitting shaders, ...). Of course llvmpipe
> isn't well suited for these kind of workloads (the most important use
> case is desktop compositing, so a couple dozen vertices per frame but
> millions of pixels...). Making vertex loads scale is something which
> just wasn't worth the effort so far (there's not actually that many
> people working on llvmpipe), albeit we realize that the completely
> non-parallel nature of it currently actually can hinder scaling quite a
> bit even for "typical" workloads (not desktop compositing, but "simple"
> 3d apps) once you've got enough cores/threads (8 or so), but that's
> something we're not worried too much about.
> I think requiring llvm 3.6 probably isn't going to work if you want to
> upstream this, a minimum version of 3.6 is fine but the general rule is
> things should still work with newer versions (including current
> development version, seems like you're using c++ interface of llvm quite
> a bit so that's probably going to require some #ifdef mess). Albeit I
> guess if you just don't try to build the driver with non-released
> versions that's probably ok (but will limit the ability for some people
> to try out your driver).

Some differences between llvmpipe and swr based on my understanding of 
llvmpipe’s architecture:

threading model
llvmpipe: single threaded vertex processing, up to 16 rasterization 
threads
swr: common thread pool that pick up frontend or backend work as 
available
vertex processing
llvmpipe: entire draw call processed in a single pass
swr: large draws chopped into chunks that can be processed in parallel
frontend/backend coupling
llvmpipe: separate binning pass in single threaded frontend
swr: frontend vertex processing and binning combined in a single pass
primitive assembly and binning
llvmpipe: scalar c code
swr: x86 avx/avx2 working on vector of primitives
fragment processing
llvmpipe: single jitted shader combining depth/fragment/stencil/blend 
on16x16 block
swr: separate jitted fragment and blend shaders, plus templated depth 
test
in-memory representation
llvmpipe: direct access to render targets
swr: hot-tile working representation with load and/or store at required 
times

As you say, we do use LLVM’s C++ API.  While that has some advantages, it’s not 
guaranteed to be stable and can/does make nontrivial changes.  3.6 to 3.7 made 
some change to at least the GEP instruction which we could work around if 
necessary for upstreaming.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-10-21 Thread Rowley, Timothy O

> On Oct 20, 2015, at 5:58 PM, Jose Fonseca  wrote:
> 
> Thanks for the explanations.  It's closer now, but still a bit of gap:
> 
> $ KNOB_MAX_THREADS_PER_CORE=0 ./gloss
> SWR create screen!
> This processor supports AVX2.
> --> numThreads = 3
> 1102 frames in 5.002 seconds = 220.312 FPS
> 1133 frames in 5.001 seconds = 226.555 FPS
> 1130 frames in 5.002 seconds = 225.91 FPS
> ^C
> $ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
> 1456 frames in 5 seconds = 291.2 FPS
> 1617 frames in 5.003 seconds = 323.206 FPS
> 1571 frames in 5.002 seconds = 314.074 FPS

A bit more of an apples to apples comparison might be single-threaded llvmpipe 
(LP_NUM_THREADS=1) and single-threaded swr (KNOB_SINGLE_THREADED=1).  Running 
gloss and glxgears (another favorite “benchmark” :) ) under these conditions 
show swr running a bit slower, though a little closer than your numbers.  
Examining performance traces, we think swr’s concept of hot-tiles, the working 
memory representation of the render target, and the associated load/store 
functions contribute to most of the difference.  We might be able to optimize 
those conversions; additionally fast clear would help these demos.  For larger 
workloads this small per-frame cost doesn’t really affect the performance.

> One final question: you said that one thread is reserved for the API, but I 
> see all threads (with top `H`) maxing up the CPU. So if the thread reserved 
> for the API is not doing vertex/fragment processing, then what is it using 
> 100% of a CPU thread for?

With a trivial application main loop and light api usage, the API thread is 
going to end up spending most of the time waiting for the other threads to 
finish work.

These initial observations from you and others regarding performance have been 
interesting.  Our performance work has been with large workloads on high core 
count configurations, where while some of the decisions such as a dedicated 
core for the application/API might have cost performance a bit, the percentage 
is much less than on the dual and quad core processors.  We’ll look into some 
changes/tuning that will benefit both extremes, though we might have to end up 
conceding that llvmpipe will be faster at glxgears. :-)  

> Final thoughts: I understand this project has its own history, but I echo 
> what Roland said -- it would be nice to unify with llvmpipe at one point, in 
> some way or fashion.  Our (VMware's) focus has been desktop composition, but 
> there's no reason why a single SW renderer can't satisfy both ends of the 
> spectrum, especially for JIT enable renderers, since they can emit at runtime 
> the code most suited for the workload.

We would be happy for someone to take some of the ideas from swr to speed up 
llvmpipe, but for now our development will continue on the swr core and driver. 
 We’re not planning on replacing llvmpipe - its intent of working on any 
architecture is admirable.  In the ideal world the solution would be something 
that combines the best traits of both rasterizers, but at this point the 
shortest path to having a performant solution for our customers is with swr. 

> That said, it's really nice seeing Mesa and Gallium enabling this sort of 
> experiments with SW rendering.

Yes, we were quite happy with how fast we were able to get a new driver 
functioning with gallium.  The major thing slowing us was the documentation, 
which is not uniform in coverage.  There was a lot of reading other drivers’ 
source to figure out how things were supposed to work.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2015-11-10 Thread Rowley, Timothy O

> On Oct 22, 2015, at 4:17 PM, Jose Fonseca  wrote:
> 
> They do share a lot already, Mesa, gallium statetracker, and gallivm. If 
> further development in openswr is planned, it might require to jump through a 
> few hoops, but I think it's worth to figure out what would take to get this 
> merged into master so that, whenever there are interface changes, openswer 
> won't get the short stick.

Yes, openswr and llvmpipe share a fair bit.  It is my hope that as we start 
working more on openswr performance, some of the effort will benefit both 
drivers.

We’re willing to jump through the hoops needed to merge into master.  To that 
end, I’ve pushed some updates that amongst other things allow us to support 
both llvm 3.6 and 3.7 (and possibly llvm-svn).  Are there any other hoops that 
spring to mind?

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] swr: [rasterizer core] Remove dead code Clipper::ClipScalar()

2017-02-06 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Feb 4, 2017, at 5:55 PM, Vinson Lee 
mailto:v...@freedesktop.org>> wrote:

Tested-by: Vinson Lee mailto:v...@freedesktop.org>>

On Thu, Feb 2, 2017 at 12:42 PM, Cherniak, Bruce
mailto:bruce.chern...@intel.com>> wrote:
I followed up with a v2 that includes the bugzilla reference.

Good point, I’ll look into following up with a patch to remove Clip().

Thanks for the quick review.

On Feb 2, 2017, at 2:26 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Reviewed-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>

I got confused by this code as well when I was trying to understand
the clipper. I think the Clip() function can go too now in the .cpp
file (as well as the fwd decl in the header)?

On Thu, Feb 2, 2017 at 3:15 PM, Bruce Cherniak 
mailto:bruce.chern...@intel.com>> wrote:
Clipper::ClipScalar() is dead code and should be removed.  It is causing
an error with gcc-7 because it references a now defunct member.

CC: "13.0 17.0" 
mailto:mesa-sta...@lists.freedesktop.org>>
---
src/gallium/drivers/swr/rasterizer/core/clip.h | 39 --
1 file changed, 39 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 085e4a9..f19858f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -262,45 +262,6 @@ public:
   return _simd_movemask_ps(vClipCullMask);
   }

-// clip a single primitive
-int ClipScalar(PA_STATE& pa, uint32_t primIndex, float* pOutPos, float* 
pOutAttribs)
-{
-OSALIGNSIMD(float) inVerts[3 * 4];
-OSALIGNSIMD(float) inAttribs[3 * KNOB_NUM_ATTRIBUTES * 4];
-
-// transpose primitive position
-__m128 verts[3];
-pa.AssembleSingle(VERTEX_POSITION_SLOT, primIndex, verts);
-_mm_store_ps(&inVerts[0], verts[0]);
-_mm_store_ps(&inVerts[4], verts[1]);
-_mm_store_ps(&inVerts[8], verts[2]);
-
-// transpose attribs
-uint32_t numScalarAttribs = this->state.linkageCount * 4;
-
-int idx = 0;
-DWORD slot = 0;
-uint32_t mapIdx = 0;
-uint32_t tmpLinkage = uint32_t(this->state.linkageMask);
-while (_BitScanForward(&slot, tmpLinkage))
-{
-tmpLinkage &= ~(1 << slot);
-// Compute absolute attrib slot in vertex array
-uint32_t inputSlot = VERTEX_ATTRIB_START_SLOT + 
this->state.linkageMap[mapIdx++];
-__m128 attrib[3];// triangle attribs (always 4 wide)
-pa.AssembleSingle(inputSlot, primIndex, attrib);
-_mm_store_ps(&inAttribs[idx], attrib[0]);
-_mm_store_ps(&inAttribs[idx + numScalarAttribs], attrib[1]);
-_mm_store_ps(&inAttribs[idx + numScalarAttribs * 2], attrib[2]);
-idx += 4;
-}
-
-int numVerts;
-Clip(inVerts, inAttribs, numScalarAttribs, pOutPos, &numVerts, 
pOutAttribs);
-
-return numVerts;
-}
-
   // clip SIMD primitives
   void ClipSimd(const simdscalar& vPrimMask, const simdscalar& vClipMask, 
PA_STATE& pa, const simdscalari& vPrimId, const simdscalari& vViewportIdx)
   {
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-stable mailing list
mesa-sta...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-stable

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] Removed unused clip code.

2017-02-06 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Feb 3, 2017, at 11:35 AM, Bruce Cherniak 
mailto:bruce.chern...@intel.com>> wrote:

Removed unused Clip() and FRUSTUM_CLIP_MASK define.
---
src/gallium/drivers/swr/rasterizer/core/clip.cpp | 22 --
src/gallium/drivers/swr/rasterizer/core/clip.h   |  4 
2 files changed, 26 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.cpp 
b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
index 7b1e09d..0a6afe5 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
@@ -157,28 +157,6 @@ int ClipTriToPlane( const float *pInPts, int numInPts,
return i;
}

-
-
-void Clip(const float *pTriangle, const float *pAttribs, int numAttribs, float 
*pOutTriangles, int *numVerts, float *pOutAttribs)
-{
-// temp storage to hold at least 6 sets of vertices, the max number that 
can be created during clipping
-OSALIGNSIMD(float) tempPts[6 * 4];
-OSALIGNSIMD(float) tempAttribs[6 * KNOB_NUM_ATTRIBUTES * 4];
-
-// we opt to clip to viewport frustum to produce smaller triangles for 
rasterization precision
-int NumOutPts = ClipTriToPlane(pTriangle, 3, pAttribs, 
numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-NumOutPts = ClipTriToPlane(pOutTriangles, NumOutPts, 
pOutAttribs, numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-NumOutPts = ClipTriToPlane(pOutTriangles, NumOutPts, 
pOutAttribs, numAttribs, tempPts, tempAttribs);
-NumOutPts = ClipTriToPlane(tempPts, NumOutPts, tempAttribs, 
numAttribs, pOutTriangles, pOutAttribs);
-
-SWR_ASSERT(NumOutPts <= 6);
-
-*numVerts = NumOutPts;
-return;
-}
-
void ClipTriangles(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t workerId, 
simdvector prims[], uint32_t primMask, simdscalari primId, simdscalari 
viewportIdx)
{
SWR_CONTEXT *pContext = pDC->pContext;
diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index f19858f..23a768f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -56,12 +56,8 @@ enum SWR_CLIPCODES
GUARDBAND_BOTTOM = (0x80 << CLIPCODE_SHIFT | 0x8)
};

-#define FRUSTUM_CLIP_MASK 
(FRUSTUM_LEFT|FRUSTUM_TOP|FRUSTUM_RIGHT|FRUSTUM_BOTTOM|FRUSTUM_NEAR|FRUSTUM_FAR)
#define GUARDBAND_CLIP_MASK 
(FRUSTUM_NEAR|FRUSTUM_FAR|GUARDBAND_LEFT|GUARDBAND_TOP|GUARDBAND_RIGHT|GUARDBAND_BOTTOM|NEGW)

-void Clip(const float *pTriangle, const float *pAttribs, int numAttribs, float 
*pOutTriangles,
-  int *numVerts, float *pOutAttribs);
-
INLINE
void ComputeClipCodes(const API_STATE& state, const simdvector& vertex, 
simdscalar& clipCodes, simdscalari viewportIndexes)
{
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH mesa] swr: [rasterizer common] fix assert index

2016-10-13 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Oct 12, 2016, at 4:13 PM, Eric Engestrom 
mailto:e...@engestrom.ch>> wrote:

Fixes: b3bd8bb611bb465d2e5e ("swr: [rasterizer core] add support
  for "RAW" surface format")
CovID: 1373647
Signed-off-by: Eric Engestrom mailto:e...@engestrom.ch>>
---
src/gallium/drivers/swr/rasterizer/common/formats.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/common/formats.h 
b/src/gallium/drivers/swr/rasterizer/common/formats.h
index 539e37a..dd5b499 100644
--- a/src/gallium/drivers/swr/rasterizer/common/formats.h
+++ b/src/gallium/drivers/swr/rasterizer/common/formats.h
@@ -248,7 +248,7 @@ extern const SWR_FORMAT_INFO gFormatInfo[NUM_SWR_FORMATS];
/// @param format - SWR format
INLINE const SWR_FORMAT_INFO& GetFormatInfo(SWR_FORMAT format)
{
-SWR_ASSERT(format <= NUM_SWR_FORMATS, "Invalid Surface Format: %d", 
format);
+SWR_ASSERT(format < NUM_SWR_FORMATS, "Invalid Surface Format: %d", format);
SWR_ASSERT(gFormatInfo[format].name != nullptr, "Invalid Surface Format: 
%d", format);
return gFormatInfo[format];
}
--
Cheers,
 Eric


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] vbo: increase VBO_SAVE_BUFFER_SIZE from 8k to 256k dwords

2016-08-18 Thread Rowley, Timothy O
Don’t think it’ll help Minecraft.  Looking at an apitrace of running around in 
the demo world for a little bit, it only created 66 display lists all of which 
were quite small (less than 100 api entries).

-Tim

> On Aug 18, 2016, at 3:10 AM, Gustaw Smolarczyk  wrote:
> 
> Hi,
> 
> Will this help Minecraft? I believe it currently uses VBOs (if they
> are enabled) and display lists. I might test it when I'm home next
> week.
> 
> Regards,
> Gustaw Smolarczyk
> 
> 2016-08-18 7:26 GMT+02:00 Mathias Fröhlich :
>> Hi,
>> 
>> On Wednesday, 17 August 2016 11:00:48 CEST Tim Rowley wrote:
>>> Increases the performance of legacy geometry-heavy apps
>>> still using display lists.
>> That is my observation too.
>> +1 from my side.
>> 
>> If you need that from me this gets:
>> Reviewed-by: Mathias Fröhlich 
>> 
>> Mathias
>> 
>>> ---
>>> src/mesa/vbo/vbo_save.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/src/mesa/vbo/vbo_save.h b/src/mesa/vbo/vbo_save.h
>>> index 2843b3c..d1d7fb0 100644
>>> --- a/src/mesa/vbo/vbo_save.h
>>> +++ b/src/mesa/vbo/vbo_save.h
>>> @@ -96,7 +96,7 @@ struct vbo_save_vertex_list {
>>>  * likelyhood as it occurs.  No reason we couldn't change usage
>>>  * internally even though this probably isn't allowed for client VBOs?
>>>  */
>>> -#define VBO_SAVE_BUFFER_SIZE (8*1024) /* dwords */
>>> +#define VBO_SAVE_BUFFER_SIZE (256*1024) /* dwords */
>>> #define VBO_SAVE_PRIM_SIZE   128
>>> #define VBO_SAVE_PRIM_MODE_MASK 0x3f
>>> #define VBO_SAVE_PRIM_WEAK  0x40
>>> 
>> 
>> 
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure.ac: add llvm inteljitevents component if enabled

2016-08-25 Thread Rowley, Timothy O
Review ping.

Missed this in my original jitevents patch because I had built llvm as one lib.

> On Aug 2, 2016, at 12:54 PM, Rowley, Timothy O  
> wrote:
> 
> Needed to successfully link llvmpipe or swr when using shared llvm libs.
> ---
> configure.ac | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/configure.ac b/configure.ac
> index fb4a12a..edbc95b 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2497,6 +2497,11 @@ dnl in LLVM_LIBS.
> 
> if test "x$MESA_LLVM" != x0; then
> 
> +if test "x$HAVE_GALLIUM_LLVMPIPE" = xyes || "x$HAVE_GALLIUM_SWR" = xyes 
> && \
> +$LLVM_CONFIG --components | grep -q inteljitevents ; then
> +   LLVM_COMPONENTS="${LLVM_COMPONENTS} inteljitevents"
> +fi
> +
> if ! $LLVM_CONFIG --libs ${LLVM_COMPONENTS} >/dev/null; then
>AC_MSG_ERROR([Calling ${LLVM_CONFIG} failed])
> fi
> -- 
> 2.7.4
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] configure.ac: add llvm inteljitevents component if enabled

2016-08-26 Thread Rowley, Timothy O

> On Aug 26, 2016, at 8:30 AM, Emil Velikov  wrote:
> 
> On 2 August 2016 at 18:54, Tim Rowley  wrote:
>> Needed to successfully link llvmpipe or swr when using shared llvm libs.
>> ---
>> configure.ac | 5 +
>> 1 file changed, 5 insertions(+)
>> 
>> diff --git a/configure.ac b/configure.ac
>> index fb4a12a..edbc95b 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -2497,6 +2497,11 @@ dnl in LLVM_LIBS.
>> 
>> if test "x$MESA_LLVM" != x0; then
>> 
>> +if test "x$HAVE_GALLIUM_LLVMPIPE" = xyes || "x$HAVE_GALLIUM_SWR" = xyes 
>> && \
>> +$LLVM_CONFIG --components | grep -q inteljitevents ; then
>> +   LLVM_COMPONENTS="${LLVM_COMPONENTS} inteljitevents"
>> +fi
>> +
> Hmm is this something required by newer LLVM, I don't recall seeing
> any issues so far. Are you sure it's required for llvmpipe and swr and
> not r600, radeonsi ?

It’s only needed if you have inteljitevents enabled in your llvm build 
configuration, which is probably somewhat rare as they are hooks for VTune.

> If it's required by everyone please add the hunk just after the
> LLVM_COMPONENTS="engine bitwriter mcjit mcdisassembler, alternatively
> keep it in the specific driver section (see r600/radeonsi).
> In either case there should be a llvm version check imho.

The symbol dependency is created by anyone using gallivm; since this is 
included in libgallium this means it should be common to all gallium drivers.  
I’ll make that change and resend.  IntelJITEvents was built as a separate 
component back to llvm 3.3, which is the earliest configure.ac allows for 
gallium, so I don’t think a version check is needed.

> 
> Thanks
> Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] swr: RDTSC StopCapture hangs

2016-09-08 Thread Rowley, Timothy O
I’ve seen the bucket hang on stop in the past, but thought this was now a thing 
of the past.  Is there a particular workload that was making this easy to 
trigger?

What workload of glmark2 were you looking at?  Watching it run, most of the 
scenes appear very light in geometry, which would cause more contention for 
work by the BE workers.

Thanks.

-Tim

> On Sep 8, 2016, at 6:55 AM, Victor Moya del Barrio 
>  wrote:
> 
> 
> Again playing with OpenSWR running on a (OpenSWR originally intended) many 
> core system.
> 
> When enabling RDTSC Buckets profiling sometimes OpenSWR gets stuck on 
> StopCapture.
> 
> StopCapture waits for all the threads to close all the pending buckets 
> (expects threads to be at bucket level 0) but the problem seems to be that 
> some threads get stuck at WorkerWaitForThreadEvent (level 1) and given that 
> StopCapture is called from SwrEndFrame (in the API thread) they are probably 
> not going to be awaken ever.
> 
> I don't have a clean solution here because I didn't study with detail how the 
> thread wait/sleep mechanism works (the real problem could be an issue on why 
> some threads are sleeping and other not) so for now I just commented the code 
> in StopCapture that expects all threads to be at level 0.
> 
> BTW. based on the RDTSC Buckets I see a very horrible utilization of the 
> threads in this system on glmark2.  The BE threads seems to spend most of the 
> cycles on a spin loop looking for work through draw contexts and tiles inside 
> draw contexts, rather than say sleep if there is no real work to be done 
> until there is (but probably there should be as we want higher FPS) (ie the 
> BE thread gets stuck between the WorkerOnFifoBE and WorkerFoundWork buckets).
> 
> Thread 39 (WORKER)
>  %Tot   %Par  Cycles CPENumEvent   CPE2   NumEvent2  Bucket
>  85.57  85.57 171485899  2243   76423  0  0  
> WorkerWorkOnFifoBE
>  24.45  28.58 49002850   125648 3900  0  |-> 
> WorkerFoundWork
> 
> 
> Victor
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallivm: permit use of avx512 instructions on llvm-3.9+

2016-11-03 Thread Rowley, Timothy O

> On Nov 3, 2016, at 7:09 PM, Roland Scheidegger  wrote:
> 
> I'm a bit worried by this.
> We've had some (a lot actually) unpleasant surprises in the past with
> llvm choosing to use instruction sets not appropriate for a given cpu...
> Hence only setting flags we checked ourselves being available, and
> disabling everything else. Not sure if this actually still works though
> given we set the host cpu name…

I’ve experienced the fragility of just using the host cpu name in llvm, though 
I thought it usually fails in the conservative direction.  Doing a little 
googling I see there was recently a bit of a hiccup with skylake accidentally 
using evex encoding.

> We do not want llvm to use evex encoded instructions (with any bit
> width) for llvmpipe at this point on "ordinary" x86 cpus (I'm
> specifically thinking about normal, albeit future, xeons, like
> skylake-ep), as that would be a completely untested path (albeit, as I
> said, I'm not even sure the current code actually really prevents
> that...) - well I suppose you tested it with KNL, which is good to know.
> Enabling it on KNL is fair enough I suppose, but I'm not sure if you can
> detect such cpus easily based on feature, do they lack something which
> the normal cpus have? I guess though since it's otherwise only mattering
> for not yet released cpus we could still fix it up later if necessary…

KNL could be detected by looking for the processor family, but that would 
prevent future avx512 processors from just working.  What would you think of 
replacing the manually maintained list of mattr overrides for x86 with some 
code that just uses what getHostCPUFeatures (based on cpuid) returns?  
Something like this:

#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64)
   llvm::StringMap features;
   llvm::sys::getHostCPUFeatures(features);

   for (StringMapIterator f = features.begin(); f != features.end(); ++f) 
{
  MAttrs.push_back(((*f).second ? "+" : "-") + (*f).first().str());
   }
#endif

-Tim

> Roland
> 
> Am 03.11.2016 um 22:29 schrieb Tim Rowley:
>> ---
>> src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 3 +++
>> 1 file changed, 3 insertions(+)
>> 
>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp 
>> b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> index bd4d4d3..bff2198 100644
>> --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp
>> @@ -583,6 +583,8 @@ 
>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>   MAttrs.push_back("-fma");
>>}
>>MAttrs.push_back(util_cpu_caps.has_avx2 ? "+avx2" : "-avx2");
>> +
>> +#if HAVE_LLVM <= 0x0308
>>/* disable avx512 and all subvariants */
>> #if HAVE_LLVM >= 0x0304
>>MAttrs.push_back("-avx512cd");
>> @@ -596,6 +598,7 @@ 
>> lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
>>MAttrs.push_back("-avx512vl");
>> #endif
>> #endif
>> +#endif
>> 
>> #if defined(PIPE_ARCH_PPC)
>>MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec");
>> 
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] swr: add support for EXT_depth_bounds_test

2016-11-07 Thread Rowley, Timothy O
We suspect the remaining failure might be due to not quantizing the depth 
bounds min/max values.  That can be addressed in a future patch.

Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 1, 2016, at 3:45 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

This fails one sub-case of the piglit depth_bounds test:

Test 10, bounds=(0.00, 0.50), z=(0.50, 0.50, 0.50, 0.50)
Probe color at (0,20)
 Expected: 255 255 255
 Observed: 26 26 26

I'm blaming it on the floating point boogey man.

src/gallium/drivers/swr/swr_screen.cpp | 2 +-
src/gallium/drivers/swr/swr_state.cpp  | 6 ++
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 704a684..fa16edd 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -332,7 +332,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
  return 0;
   case PIPE_CAP_DEPTH_BOUNDS_TEST:
-  return 0; // xxx
+  return 1;
   case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
   case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
  return 1;
diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 3e02322..d8a8ee1 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1205,6 +1205,7 @@ swr_update_derived(struct pipe_context *pipe,
  struct pipe_depth_state *depth = &(ctx->depth_stencil->depth);
  struct pipe_stencil_state *stencil = ctx->depth_stencil->stencil;
  SWR_DEPTH_STENCIL_STATE depthStencilState = {{0}};
+  SWR_DEPTH_BOUNDS_STATE depthBoundsState = {0};

  /* XXX, incomplete.  Need to flesh out stencil & alpha test state
  struct pipe_stencil_state *front_stencil =
@@ -1251,6 +1252,11 @@ swr_update_derived(struct pipe_context *pipe,
  depthStencilState.depthTestFunc = swr_convert_depth_func(depth->func);
  depthStencilState.depthWriteEnable = depth->writemask;
  SwrSetDepthStencilState(ctx->swrContext, &depthStencilState);
+
+  depthBoundsState.depthBoundsTestEnable = depth->bounds_test;
+  depthBoundsState.depthBoundsTestMinValue = depth->bounds_min;
+  depthBoundsState.depthBoundsTestMaxValue = depth->bounds_max;
+  SwrSetDepthBoundsState(ctx->swrContext, &depthBoundsState);
   }

   /* Blend State */
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] swr: [rasterizer core]: set depth hottile when depth bounds test enabled

2016-11-07 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 1, 2016, at 3:45 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/api.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 5f941e8..b1a426d 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -950,9 +950,11 @@ void SetupPipeline(DRAW_CONTEXT *pDC)
// have to check for the special case where depth/stencil test is enabled 
but depthwrite is disabled.
pState->state.depthHottileEnable = 
((!(pState->state.depthStencilState.depthTestEnable &&
   
!pState->state.depthStencilState.depthWriteEnable &&
+   
!pState->state.depthBoundsState.depthBoundsTestEnable &&
   
pState->state.depthStencilState.depthTestFunc == ZFUNC_ALWAYS)) &&

(pState->state.depthStencilState.depthTestEnable ||
- 
pState->state.depthStencilState.depthWriteEnable)) ? true : false;
+ 
pState->state.depthStencilState.depthWriteEnable ||
+ 
pState->state.depthBoundsState.depthBoundsTestEnable)) ? true : false;

pState->state.stencilHottileEnable = 
(((!(pState->state.depthStencilState.stencilTestEnable &&
 
!pState->state.depthStencilState.stencilWriteEnable &&
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] swr: fix AND_INVERTED logic op conversion

2016-11-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 7, 2016, at 6:18 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_state.h 
b/src/gallium/drivers/swr/swr_state.h
index 0e3b49d..8409114 100644
--- a/src/gallium/drivers/swr/swr_state.h
+++ b/src/gallium/drivers/swr/swr_state.h
@@ -106,7 +106,7 @@ swr_convert_logic_op(const UINT op)
   case PIPE_LOGICOP_NOR:
  return LOGICOP_NOR;
   case PIPE_LOGICOP_AND_INVERTED:
-  return LOGICOP_CLEAR;
+  return LOGICOP_AND_INVERTED;
   case PIPE_LOGICOP_COPY_INVERTED:
  return LOGICOP_COPY_INVERTED;
   case PIPE_LOGICOP_AND_REVERSE:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] swr: disable logic op when the rt format is float

2016-11-08 Thread Rowley, Timothy O
Looking at the spec, that seems like that should also check for sRGB and also 
disable in that case (“GetFormatInfo(compileState.format).isSRGB”).

> On Nov 7, 2016, at 6:18 PM, Ilia Mirkin  wrote:
> 
> Signed-off-by: Ilia Mirkin 
> ---
> src/gallium/drivers/swr/swr_state.cpp | 5 +
> 1 file changed, 5 insertions(+)
> 
> diff --git a/src/gallium/drivers/swr/swr_state.cpp 
> b/src/gallium/drivers/swr/swr_state.cpp
> index d8a8ee1..acb0452 100644
> --- a/src/gallium/drivers/swr/swr_state.cpp
> +++ b/src/gallium/drivers/swr/swr_state.cpp
> @@ -1305,6 +1305,11 @@ swr_update_derived(struct pipe_context *pipe,
>&ctx->blend->compileState[target],
>sizeof(compileState.blendState));
> 
> +if (compileState.blendState.logicOpEnable &&
> +GetFormatInfo(compileState.format).type[0] == 
> SWR_TYPE_FLOAT) {
> +   compileState.blendState.logicOpEnable = false;
> +}
> +
> if (compileState.blendState.blendEnable == false &&
> compileState.blendState.logicOpEnable == false) {
>SwrSetBlendFunc(ctx->swrContext, target, NULL);
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: disable logic op when the rt format is float or srgb

2016-11-08 Thread Rowley, Timothy O
I’d prefer parenthesis to clarify the logic "(foo && ((bar == bla) || footer)”.

With those added, Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 8, 2016, at 4:30 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.cpp | 6 ++
1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index d8a8ee1..d16c307 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1305,6 +1305,12 @@ swr_update_derived(struct pipe_context *pipe,
   &ctx->blend->compileState[target],
   sizeof(compileState.blendState));

+const SWR_FORMAT_INFO& info = GetFormatInfo(compileState.format);
+if (compileState.blendState.logicOpEnable &&
+(info.type[0] == SWR_TYPE_FLOAT || info.isSRGB)) {
+   compileState.blendState.logicOpEnable = false;
+}
+
if (compileState.blendState.blendEnable == false &&
compileState.blendState.logicOpEnable == false) {
   SwrSetBlendFunc(ctx->swrContext, target, NULL);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] swr: [rasterizer jitter] fix logic op to work with unorm/snorm

2016-11-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 7, 2016, at 6:18 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Most logic op usage is probably going to end up with normalized
textures. Scale the floating point values and convert to integer before
performing the logic operations.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

The gl-1.1-xor-copypixels test still fails. The image stays the same. I'm
suspecting it's for reasons outside of this patch.

I'm not too familiar with the whole swr infrastructure, perhaps there was
an eaiser way to do all this. I looked for conversion helper functions but
couldn't find anything that would fit nicely here. Feel free to point me
in the right direction.

.../drivers/swr/rasterizer/jitter/blend_jit.cpp| 81 +-
1 file changed, 64 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
index 1452d27..d69d503 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
@@ -649,29 +649,54 @@ struct BlendJit : public Builder
if(state.blendState.logicOpEnable)
{
const SWR_FORMAT_INFO& info = GetFormatInfo(state.format);
-SWR_ASSERT(info.type[0] == SWR_TYPE_UINT);
Value* vMask[4];
+float scale[4];
+
+if (!state.blendState.blendEnable) {
+Clamp(state.format, src);
+Clamp(state.format, dst);
+}
+
for(uint32_t i = 0; i < 4; i++)
{
-switch(info.bpc[i])
+if (info.type[i] == SWR_TYPE_UNUSED)
{
-case 0: vMask[i] = VIMMED1(0x); break;
-case 2: vMask[i] = VIMMED1(0x0003); break;
-case 5: vMask[i] = VIMMED1(0x001F); break;
-case 6: vMask[i] = VIMMED1(0x003F); break;
-case 8: vMask[i] = VIMMED1(0x00FF); break;
-case 10: vMask[i] = VIMMED1(0x03FF); break;
-case 11: vMask[i] = VIMMED1(0x07FF); break;
-case 16: vMask[i] = VIMMED1(0x); break;
-case 24: vMask[i] = VIMMED1(0x00FF); break;
-case 32: vMask[i] = VIMMED1(0x); break;
+continue;
+}
+
+if (info.bpc[i] >= 32) {
+vMask[i] = VIMMED1(0x);
+scale[i] = 0x;
+} else {
+vMask[i] = VIMMED1((1 << info.bpc[i]) - 1);
+if (info.type[i] == SWR_TYPE_SNORM)
+scale[i] = (1 << (info.bpc[i] - 1)) - 1;
+else
+scale[i] = (1 << info.bpc[i]) - 1;
+}
+
+switch (info.type[i]) {
default:
-vMask[i] = VIMMED1(0x0);
-SWR_ASSERT(0, "Unsupported bpc for logic op\n");
+SWR_ASSERT(0, "Unsupported type for logic op\n");
+/* fallthrough */
+case SWR_TYPE_UINT:
+case SWR_TYPE_SINT:
+src[i] = BITCAST(src[i], mSimdInt32Ty);
+dst[i] = BITCAST(dst[i], mSimdInt32Ty);
+break;
+case SWR_TYPE_SNORM:
+src[i] = FADD(src[i], VIMMED1(0.5f));
+dst[i] = FADD(dst[i], VIMMED1(0.5f));
+/* fallthrough */
+case SWR_TYPE_UNORM:
+src[i] = FP_TO_UI(
+FMUL(src[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
+dst[i] = FP_TO_UI(
+FMUL(dst[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
break;
}
-src[i] = BITCAST(src[i], mSimdInt32Ty);//, vMask[i]);
-dst[i] = BITCAST(dst[i], mSimdInt32Ty);
}

LogicOpFunc(state.blendState.logicOpFunc, src, dst, result);
@@ -679,10 +704,32 @@ struct BlendJit : public Builder
// store results out
for(uint32_t i = 0; i < 4; ++i)
{
+if (info.type[i] == SWR_TYPE_UNUSED)
+{
+continue;
+}
+
// clear upper bits from PS output not in RT format after doing 
logic op
result[i] = AND(result[i], vMask[i]);

-STORE(BITCAST(result[i], mSimdFP32Ty), pResult, {i});
+switch (info.type[i]) {
+default:
+SWR_ASSERT(0, "Unsupported type for logic op\n");
+/* fallthrough */
+case SWR_TYPE_UINT:
+case SWR_TYPE_SINT:
+result[i] = BITCAST(resu

Re: [Mesa-dev] [PATCH 1/2] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O
Yes, that was about to be my suggestion too - remove the drivertype use in the 
clipper and use the “clipHalfZ” flag instead.

driverType is still used in the SwrSetViewports setup, so can’t be completely 
removed right now.

-Tim

> On Nov 9, 2016, at 10:04 AM, Ilia Mirkin  wrote:
> 
> On Wed, Nov 9, 2016 at 1:21 AM, Ilia Mirkin  wrote:
>> With ARB_clip_control, GL may also do 0..1 depth clipping, not just
>> -1..1. For backwards compatibility, preserve the existing driver type
>> check for DX as well.
> 
> Oh. An even better idea would be to update SwrSetRasterizer to
> magically flip that bit on when driverType == DX at the API level. And
> drop all that driver type stuff from the core otherwise. IMHO that's
> much cleaner. Let me know if you think this will cause problems.
> 
> At least in gallium, we've avoided making any GL vs DX distinctions.
> It seems like a good thing to do in swr core as well.
> 
> Cheers,
> 
>  -ilia
> 
>> 
>> Signed-off-by: Ilia Mirkin 
>> ---
>> src/gallium/drivers/swr/rasterizer/core/clip.h  | 6 +++---
>> src/gallium/drivers/swr/rasterizer/core/state.h | 1 +
>> 2 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
>> b/src/gallium/drivers/swr/rasterizer/core/clip.h
>> index 43bc522..78dbcf0 100644
>> --- a/src/gallium/drivers/swr/rasterizer/core/clip.h
>> +++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
>> @@ -90,7 +90,7 @@ void ComputeClipCodes(DRIVER_TYPE type, const API_STATE& 
>> state, const simdvector
>> {
>> // FRUSTUM_NEAR
>> // DX clips depth [0..w], GL clips [-w..w]
>> -if (type == DX)
>> +if (type == DX || state.rastState.clipHalfZ)
>> {
>> vRes = _simd_cmplt_ps(vertex.z, _simd_setzero_ps());
>> }
>> @@ -640,7 +640,7 @@ private:
>> case FRUSTUM_BOTTOM:t = ComputeInterpFactor(_simd_sub_ps(v1[3], 
>> v1[1]), _simd_sub_ps(v2[3], v2[1])); break;
>> case FRUSTUM_NEAR:
>> // DX Znear plane is 0, GL is -w
>> -if (this->driverType == DX)
>> +if (this->driverType == DX || this->state.rastState.clipHalfZ)
>> {
>> t = ComputeInterpFactor(v1[2], v2[2]);
>> }
>> @@ -708,7 +708,7 @@ private:
>> case FRUSTUM_RIGHT: return _simd_cmple_ps(v[0], v[3]);
>> case FRUSTUM_TOP:   return _simd_cmpge_ps(v[1], 
>> _simd_mul_ps(v[3], _simd_set1_ps(-1.0f)));
>> case FRUSTUM_BOTTOM:return _simd_cmple_ps(v[1], v[3]);
>> -case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], 
>> this->driverType == DX ? _simd_setzero_ps() : _simd_mul_ps(v[3], 
>> _simd_set1_ps(-1.0f)));
>> +case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], 
>> this->driverType == DX || this->state.rastState.clipHalfZ ? 
>> _simd_setzero_ps() : _simd_mul_ps(v[3], _simd_set1_ps(-1.0f)));
>> case FRUSTUM_FAR:   return _simd_cmple_ps(v[2], v[3]);
>> default:
>> SWR_ASSERT(false, "invalid clipping plane: %d", ClippingPlane);
>> diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
>> b/src/gallium/drivers/swr/rasterizer/core/state.h
>> index 93e4565..5ee12e8 100644
>> --- a/src/gallium/drivers/swr/rasterizer/core/state.h
>> +++ b/src/gallium/drivers/swr/rasterizer/core/state.h
>> @@ -932,6 +932,7 @@ struct SWR_RASTSTATE
>> uint32_t frontWinding   : 1;
>> uint32_t scissorEnable  : 1;
>> uint32_t depthClipEnable: 1;
>> +uint32_t clipHalfZ  : 1;
>> uint32_t pointParam : 1;
>> uint32_t pointSpriteEnable  : 1;
>> uint32_t pointSpriteTopOrigin   : 1;
>> --
>> 2.7.3
>> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O

> On Nov 9, 2016, at 10:16 AM, Ilia Mirkin  wrote:
> 
> On Wed, Nov 9, 2016 at 11:12 AM, Rowley, Timothy O
>  wrote:
>> Yes, that was about to be my suggestion too - remove the drivertype use in 
>> the clipper and use the “clipHalfZ” flag instead.
>> 
>> driverType is still used in the SwrSetViewports setup, so can’t be 
>> completely removed right now.
> 
> Right. I was thinking of that as the "API" level, vs all the actual
> rasterization/etc logic as the "core". Perhaps you have different
> terminology.

We talk about the Swr* functions in api.h/cpp as the API level, and below that 
the core.

> Unsurprisingly, that last usage will also be about halfz, so if you
> can update your DX frontend to just set that bit, we could get rid of
> the driverType entirely. Not sure what your other users of swr are, so
> perhaps that's not a reasonable request.

I’ll talk to developers here about this last usage.  First thought is that we 
wouldn’t want SwrSetViewports dependent on rastState, but if the viewport setup 
was done lazily at SetupPipeline time then using the clip flag could work.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O
SwrSetRastState shouldn’t be overriding what the caller passed in.  clip.h 
changes look good.

> On Nov 9, 2016, at 10:56 AM, Ilia Mirkin  wrote:
> 
> With ARB_clip_control, GL may also do 0..1 depth clipping, not just
> -1..1. This removes clip's reliance on driver type. Instead we force
> halfz to on for DX driver types at the API layer.
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> v1 -> v2: remove driverType from clip, set halfz at API layer
> 
> src/gallium/drivers/swr/rasterizer/core/api.cpp |  4 
> src/gallium/drivers/swr/rasterizer/core/clip.h  | 13 ++---
> src/gallium/drivers/swr/rasterizer/core/state.h |  1 +
> 3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/api.cpp
> index b1a426d..70bd6a8 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
> @@ -701,6 +701,10 @@ void SwrSetRastState(
> API_STATE* pState = GetDrawState(pContext);
> 
> memcpy(&pState->rastState, pRastState, sizeof(SWR_RASTSTATE));
> +if (pContext->driverType == DX)
> +{
> +pState->rastState.clipHalfZ = 1;
> +}
> }
> 
> void SwrSetViewports(
> diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
> b/src/gallium/drivers/swr/rasterizer/core/clip.h
> index 43bc522..3d86b28 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/clip.h
> +++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
> @@ -63,7 +63,7 @@ void Clip(const float *pTriangle, const float *pAttribs, 
> int numAttribs, float *
>   int *numVerts, float *pOutAttribs);
> 
> INLINE
> -void ComputeClipCodes(DRIVER_TYPE type, const API_STATE& state, const 
> simdvector& vertex, simdscalar& clipCodes, simdscalari viewportIndexes)
> +void ComputeClipCodes(const API_STATE& state, const simdvector& vertex, 
> simdscalar& clipCodes, simdscalari viewportIndexes)
> {
> clipCodes = _simd_setzero_ps();
> 
> @@ -90,7 +90,7 @@ void ComputeClipCodes(DRIVER_TYPE type, const API_STATE& 
> state, const simdvector
> {
> // FRUSTUM_NEAR
> // DX clips depth [0..w], GL clips [-w..w]
> -if (type == DX)
> +if (state.rastState.clipHalfZ)
> {
> vRes = _simd_cmplt_ps(vertex.z, _simd_setzero_ps());
> }
> @@ -135,7 +135,7 @@ class Clipper
> {
> public:
> Clipper(uint32_t in_workerId, DRAW_CONTEXT* in_pDC) :
> -workerId(in_workerId), driverType(in_pDC->pContext->driverType), 
> pDC(in_pDC), state(GetApiState(in_pDC))
> +workerId(in_workerId), pDC(in_pDC), state(GetApiState(in_pDC))
> {
> static_assert(NumVertsPerPrim >= 1 && NumVertsPerPrim <= 3, "Invalid 
> NumVertsPerPrim");
> }
> @@ -144,7 +144,7 @@ public:
> {
> for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
> {
> -::ComputeClipCodes(this->driverType, this->state, vertex[i], 
> this->clipCodes[i], viewportIndexes);
> +::ComputeClipCodes(this->state, vertex[i], this->clipCodes[i], 
> viewportIndexes);
> }
> }
> 
> @@ -640,7 +640,7 @@ private:
> case FRUSTUM_BOTTOM:t = ComputeInterpFactor(_simd_sub_ps(v1[3], 
> v1[1]), _simd_sub_ps(v2[3], v2[1])); break;
> case FRUSTUM_NEAR:  
> // DX Znear plane is 0, GL is -w
> -if (this->driverType == DX)
> +if (this->state.rastState.clipHalfZ)
> {
> t = ComputeInterpFactor(v1[2], v2[2]);
> }
> @@ -708,7 +708,7 @@ private:
> case FRUSTUM_RIGHT: return _simd_cmple_ps(v[0], v[3]);
> case FRUSTUM_TOP:   return _simd_cmpge_ps(v[1], 
> _simd_mul_ps(v[3], _simd_set1_ps(-1.0f)));
> case FRUSTUM_BOTTOM:return _simd_cmple_ps(v[1], v[3]);
> -case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], this->driverType 
> == DX ? _simd_setzero_ps() : _simd_mul_ps(v[3], _simd_set1_ps(-1.0f)));
> +case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], 
> this->state.rastState.clipHalfZ ? _simd_setzero_ps() : _simd_mul_ps(v[3], 
> _simd_set1_ps(-1.0f)));
> case FRUSTUM_FAR:   return _simd_cmple_ps(v[2], v[3]);
> default:
> SWR_ASSERT(false, "invalid clipping plane: %d", ClippingPlane);
> @@ -942,7 +942,6 @@ private:
> }
> 
> const uint32_t workerId{ 0 };
> -const DRIVER_TYPE driverType{ DX };
> DRAW_CONTEXT* pDC{ nullptr };
> const API_STATE& state;
> simdscalar clipCodes[NumVertsPerPrim];
> diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
> b/src/gallium/drivers/swr/rasterizer/core/state.h
> index 93e4565..5ee12e8 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/state.h
> +++ b/src/gallium/drivers/swr/rasterizer/core/state.h
> @@ -932,6 +932,7 @@ struct SWR_RASTSTATE
> uint32_t frontWinding   : 1;
> uint32_t scissorEnable  : 1;
> uint32_t depthClipEnable: 1;
> +uint32_t clipHalfZ   

Re: [Mesa-dev] [PATCH v2] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O

> On Nov 9, 2016, at 11:03 AM, Ilia Mirkin  wrote:
> 
> On Wed, Nov 9, 2016 at 12:03 PM, Rowley, Timothy O
>  wrote:
>> SwrSetRastState shouldn’t be overriding what the caller passed in.  clip.h 
>> changes look good.
> 
> Then how should I force halfZ to 1 for DX?

The user of the swr api will set clipHalfZ appropriately in the SWR_RASTSTATE 
that they pass in.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer] add a .dir-locals.el to support 4-space indents

2016-11-09 Thread Rowley, Timothy O
A couple good additions would be  “(indent-tabs-mode . nil)” and a 
"(show-trailing-whitespace . t)”.

With that, Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 9, 2016, at 11:08 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/.dir-locals.el | 6 ++
1 file changed, 6 insertions(+)
create mode 100644 src/gallium/drivers/swr/rasterizer/.dir-locals.el

diff --git a/src/gallium/drivers/swr/rasterizer/.dir-locals.el 
b/src/gallium/drivers/swr/rasterizer/.dir-locals.el
new file mode 100644
index 000..63613a9
--- /dev/null
+++ b/src/gallium/drivers/swr/rasterizer/.dir-locals.el
@@ -0,0 +1,6 @@
+((prog-mode
+  (c-basic-offset . 4)
+  (c-file-style . "k&r")
+  (fill-column . 78)
+  )
+ )
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O

> On Nov 9, 2016, at 11:09 AM, Ilia Mirkin  wrote:
> 
> On Wed, Nov 9, 2016 at 12:06 PM, Rowley, Timothy O
>  wrote:
>> 
>>> On Nov 9, 2016, at 11:03 AM, Ilia Mirkin  wrote:
>>> 
>>> On Wed, Nov 9, 2016 at 12:03 PM, Rowley, Timothy O
>>>  wrote:
>>>> SwrSetRastState shouldn’t be overriding what the caller passed in.  clip.h 
>>>> changes look good.
>>> 
>>> Then how should I force halfZ to 1 for DX?
>> 
>> The user of the swr api will set clipHalfZ appropriately in the 
>> SWR_RASTSTATE that they pass in.
> 
> Oh, that's nice. I just didn't want to break existing users. But if
> that's an OK thing to do, even better :)

We can adjust users to API changes that clean things up.

> Will drop that bit from my commit. Want me to resend?

If you could, yes.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: [rasterizer core] allow an OpenGL driver to specify halfz clipping

2016-11-09 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 9, 2016, at 11:50 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

With ARB_clip_control, GL may also do 0..1 depth clipping, not just
-1..1. This removes clip's reliance on driver type. DX users will need
to be updated to set the new clipHalfZ flag to get proper clipping
functionality.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

v2 -> v3: drop the api-level setting of clipHalfZ for DX driverType.

src/gallium/drivers/swr/rasterizer/core/clip.h  | 13 ++---
src/gallium/drivers/swr/rasterizer/core/state.h |  1 +
2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 43bc522..3d86b28 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -63,7 +63,7 @@ void Clip(const float *pTriangle, const float *pAttribs, int 
numAttribs, float *
  int *numVerts, float *pOutAttribs);

INLINE
-void ComputeClipCodes(DRIVER_TYPE type, const API_STATE& state, const 
simdvector& vertex, simdscalar& clipCodes, simdscalari viewportIndexes)
+void ComputeClipCodes(const API_STATE& state, const simdvector& vertex, 
simdscalar& clipCodes, simdscalari viewportIndexes)
{
clipCodes = _simd_setzero_ps();

@@ -90,7 +90,7 @@ void ComputeClipCodes(DRIVER_TYPE type, const API_STATE& 
state, const simdvector
{
// FRUSTUM_NEAR
// DX clips depth [0..w], GL clips [-w..w]
-if (type == DX)
+if (state.rastState.clipHalfZ)
{
vRes = _simd_cmplt_ps(vertex.z, _simd_setzero_ps());
}
@@ -135,7 +135,7 @@ class Clipper
{
public:
Clipper(uint32_t in_workerId, DRAW_CONTEXT* in_pDC) :
-workerId(in_workerId), driverType(in_pDC->pContext->driverType), 
pDC(in_pDC), state(GetApiState(in_pDC))
+workerId(in_workerId), pDC(in_pDC), state(GetApiState(in_pDC))
{
static_assert(NumVertsPerPrim >= 1 && NumVertsPerPrim <= 3, "Invalid 
NumVertsPerPrim");
}
@@ -144,7 +144,7 @@ public:
{
for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
{
-::ComputeClipCodes(this->driverType, this->state, vertex[i], 
this->clipCodes[i], viewportIndexes);
+::ComputeClipCodes(this->state, vertex[i], this->clipCodes[i], 
viewportIndexes);
}
}

@@ -640,7 +640,7 @@ private:
case FRUSTUM_BOTTOM:t = ComputeInterpFactor(_simd_sub_ps(v1[3], 
v1[1]), _simd_sub_ps(v2[3], v2[1])); break;
case FRUSTUM_NEAR:
// DX Znear plane is 0, GL is -w
-if (this->driverType == DX)
+if (this->state.rastState.clipHalfZ)
{
t = ComputeInterpFactor(v1[2], v2[2]);
}
@@ -708,7 +708,7 @@ private:
case FRUSTUM_RIGHT: return _simd_cmple_ps(v[0], v[3]);
case FRUSTUM_TOP:   return _simd_cmpge_ps(v[1], _simd_mul_ps(v[3], 
_simd_set1_ps(-1.0f)));
case FRUSTUM_BOTTOM:return _simd_cmple_ps(v[1], v[3]);
-case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], this->driverType 
== DX ? _simd_setzero_ps() : _simd_mul_ps(v[3], _simd_set1_ps(-1.0f)));
+case FRUSTUM_NEAR:  return _simd_cmpge_ps(v[2], 
this->state.rastState.clipHalfZ ? _simd_setzero_ps() : _simd_mul_ps(v[3], 
_simd_set1_ps(-1.0f)));
case FRUSTUM_FAR:   return _simd_cmple_ps(v[2], v[3]);
default:
SWR_ASSERT(false, "invalid clipping plane: %d", ClippingPlane);
@@ -942,7 +942,6 @@ private:
}

const uint32_t workerId{ 0 };
-const DRIVER_TYPE driverType{ DX };
DRAW_CONTEXT* pDC{ nullptr };
const API_STATE& state;
simdscalar clipCodes[NumVertsPerPrim];
diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
b/src/gallium/drivers/swr/rasterizer/core/state.h
index 93e4565..5ee12e8 100644
--- a/src/gallium/drivers/swr/rasterizer/core/state.h
+++ b/src/gallium/drivers/swr/rasterizer/core/state.h
@@ -932,6 +932,7 @@ struct SWR_RASTSTATE
uint32_t frontWinding   : 1;
uint32_t scissorEnable  : 1;
uint32_t depthClipEnable: 1;
+uint32_t clipHalfZ  : 1;
uint32_t pointParam : 1;
uint32_t pointSpriteEnable  : 1;
uint32_t pointSpriteTopOrigin   : 1;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: fix support for inverted depth scales

2016-11-09 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 8, 2016, at 11:03 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

This improves bin/arb_clip_control-clip-control results, but still not
quite there yet.

src/gallium/drivers/swr/swr_state.cpp | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index ede475a..01cadce 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -38,6 +38,7 @@
#include "util/u_inlines.h"
#include "util/u_helpers.h"
#include "util/u_framebuffer.h"
+#include "util/u_viewport.h"

#include "swr_state.h"
#include "swr_context.h"
@@ -951,13 +952,8 @@ swr_update_derived(struct pipe_context *pipe,
  vp->width = state->translate[0] + state->scale[0];
  vp->y = state->translate[1] - fabs(state->scale[1]);
  vp->height = state->translate[1] + fabs(state->scale[1]);
-  if (rasterizer->clip_halfz == 0) {
- vp->minZ = state->translate[2] - state->scale[2];
- vp->maxZ = state->translate[2] + state->scale[2];
-  } else {
- vp->minZ = state->translate[2];
- vp->maxZ = state->translate[2] + state->scale[2];
-  }
+  util_viewport_zmin_zmax(state, rasterizer->clip_halfz,
+  &vp->minZ, &vp->maxZ);

  vpm->m00[0] = state->scale[0];
  vpm->m11[0] = state->scale[1];
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: correct setting of independentAlphaBlendEnable

2016-11-09 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 9, 2016, at 1:38 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

This setting is for whether color and alpha have different blend
settings, not for whether blending is enabled on a per-RT basis.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

This fixes gl-1.0-blend-func. Not 100% sure why this setting is a thing, as
opposed to just looking at the values directly, but ... wtvr.

src/gallium/drivers/swr/swr_state.cpp | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index d19acfb..65327f3 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1318,7 +1318,12 @@ swr_update_derived(struct pipe_context *pipe,
compileState.desc.alphaTestEnable =
   ctx->depth_stencil->alpha.enabled;
compileState.desc.independentAlphaBlendEnable =
-   ctx->blend->pipe.independent_blend_enable;
+   (compileState.blendState.sourceBlendFactor !=
+compileState.blendState.sourceAlphaBlendFactor) ||
+   (compileState.blendState.destBlendFactor !=
+compileState.blendState.destAlphaBlendFactor) ||
+   (compileState.blendState.colorBlendFunc !=
+compileState.blendState.alphaBlendFunc);
compileState.desc.alphaToCoverageEnable =
   ctx->blend->pipe.alpha_to_coverage;
compileState.desc.sampleMaskEnable = 0; // XXX
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: disallow luminance/intensity, bgr5x1, and bgr10x2 rt formats

2016-11-09 Thread Rowley, Timothy O
While more verbose and a little slower, I think using the 
util_is_intensity/luminance* functions would make it clearer to someone reading 
the code what’s being excluded.

> On Nov 9, 2016, at 2:15 PM, Ilia Mirkin  wrote:
> 
> The rasterizer does not have StoreTile support for these, so just mark
> them as unsupported. They may still be used for texturing if necessary.
> 
> Fixes fbo-blending-formats piglit test.
> 
> Signed-off-by: Ilia Mirkin 
> ---
> src/gallium/drivers/swr/swr_screen.cpp | 14 ++
> 1 file changed, 14 insertions(+)
> 
> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
> b/src/gallium/drivers/swr/swr_screen.cpp
> index fa16edd..c4b46b9 100644
> --- a/src/gallium/drivers/swr/swr_screen.cpp
> +++ b/src/gallium/drivers/swr/swr_screen.cpp
> @@ -114,6 +114,20 @@ swr_is_format_supported(struct pipe_screen *screen,
>  return FALSE;
> 
>   /*
> +   * Don't allow any luminance/intensity formats
> +   */
> +  if (format_desc->swizzle[0] == format_desc->swizzle[1] &&
> +  format_desc->swizzle[0] != PIPE_SWIZZLE_0)
> + return FALSE;
> +
> +  /*
> +   * There's also currently no support for rendering to BGR5X1 or BGR10X2
> +   */
> +  if (format == PIPE_FORMAT_B5G5R5X1_UNORM ||
> +  format == PIPE_FORMAT_B10G10R10X2_UNORM)
> + return FALSE;
> +
> +  /*
>* Although possible, it is unnatural to render into compressed or YUV
>* surfaces. So disable these here to avoid going into weird paths
>* inside the state trackers.
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/14] swr: [rasterizer jitter] code style fix

2016-11-09 Thread Rowley, Timothy O
Oh, good spotting.  I had incorporated the earlier version of your patch which 
had the other bracing “if () {“ bracing stye.  I’ll remove this change from the 
push.

> On Nov 9, 2016, at 9:38 PM, Ilia Mirkin  wrote:
> 
> What's the preferred style? It seems like every other if () in this
> file has a { starting on the next line. That's why I ended up doing it
> that way as well..
> 
> On Wed, Nov 9, 2016 at 10:18 PM, Tim Rowley  
> wrote:
>> ---
>> src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
>> b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
>> index 46ea495..d69d503 100644
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
>> @@ -652,8 +652,7 @@ struct BlendJit : public Builder
>> Value* vMask[4];
>> float scale[4];
>> 
>> -if (!state.blendState.blendEnable)
>> -{
>> +if (!state.blendState.blendEnable) {
>> Clamp(state.format, src);
>> Clamp(state.format, dst);
>> }
>> --
>> 2.7.4
>> 
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/14] swr: [rasterizer core/jitter] fix alpha test bug

2016-11-09 Thread Rowley, Timothy O
Ah, yes, this patch missed the 8x2 tile path - I’ve fixed that now.  I don’t 
see another path to using the blend jit functions.

Thanks.

On Nov 9, 2016, at 10:44 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:


I think a second instance of the blend func being called was missed in 
backend.h (the avx512 one). Also iirc there was so.e third place where it was 
called... Should grep for pfnBlendFunc and see if any other instances pop up.

On Nov 9, 2016 10:19 PM, "Tim Rowley" 
mailto:timothy.o.row...@intel.com>> wrote:
Alpha from render target 0 should always be used for alpha test for all
render targets, according to GL and DX9 specs. Previously we were using
alpha from the current render target.
---
 src/gallium/drivers/swr/rasterizer/core/backend.h   |  1 +
 src/gallium/drivers/swr/rasterizer/core/state.h |  6 +-
 src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp | 10 --
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.h 
b/src/gallium/drivers/swr/rasterizer/core/backend.h
index dc0be90..a7018e0 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.h
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.h
@@ -714,6 +714,7 @@ INLINE void OutputMerger(SWR_PS_CONTEXT &psContext, 
uint8_t* (&pColorBase)[SWR_N
 pBlendState,
 psContext.shaded[rt],
 psContext.shaded[1],
+psContext.shaded[0].w,
 sample,
 pColorSample,
 blendOut,
diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
b/src/gallium/drivers/swr/rasterizer/core/state.h
index 5ee12e8..24927cd 100644
--- a/src/gallium/drivers/swr/rasterizer/core/state.h
+++ b/src/gallium/drivers/swr/rasterizer/core/state.h
@@ -805,9 +805,13 @@ typedef void(__cdecl *PFN_CS_FUNC)(HANDLE hPrivateData, 
SWR_CS_CONTEXT* pCsConte
 typedef void(__cdecl *PFN_SO_FUNC)(SWR_STREAMOUT_CONTEXT& soContext);
 typedef void(__cdecl *PFN_PIXEL_KERNEL)(HANDLE hPrivateData, SWR_PS_CONTEXT 
*pContext);
 typedef void(__cdecl *PFN_CPIXEL_KERNEL)(HANDLE hPrivateData, SWR_PS_CONTEXT 
*pContext);
-typedef void(__cdecl *PFN_BLEND_JIT_FUNC)(const SWR_BLEND_STATE*, simdvector&, 
simdvector&, uint32_t, uint8_t*, simdvector&, simdscalari*, simdscalari*);
+typedef void(__cdecl *PFN_BLEND_JIT_FUNC)(const SWR_BLEND_STATE*,
+simdvector& vSrc, simdvector& vSrc1, simdscalar& vSrc0Alpha, uint32_t 
sample,
+uint8_t* pDst, simdvector& vResult, simdscalari* vOMask, simdscalari* 
vCoverageMask);
 typedef simdscalar(*PFN_QUANTIZE_DEPTH)(simdscalar);

+
+
 //
 /// FRONTEND_STATE
 /
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
index d69d503..43e3d36 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
@@ -443,10 +443,13 @@ struct BlendJit : public Builder
 }
 }

-void AlphaTest(const BLEND_COMPILE_STATE& state, Value* pBlendState, 
Value* pAlpha, Value* ppMask)
+void AlphaTest(const BLEND_COMPILE_STATE& state, Value* pBlendState, 
Value* ppAlpha, Value* ppMask)
 {
 // load uint32_t reference
 Value* pRef = VBROADCAST(LOAD(pBlendState, { 0, 
SWR_BLEND_STATE_alphaTestReference }));
+
+// load alpha
+Value* pAlpha = LOAD(ppAlpha);

 Value* pTest = nullptr;
 if (state.alphaTestFormat == ALPHA_TEST_UNORM8)
@@ -523,6 +526,7 @@ struct BlendJit : public Builder
 PointerType::get(Gen_SWR_BLEND_STATE(JM()), 0), // SWR_BLEND_STATE*
 PointerType::get(mSimdFP32Ty, 0),   // simdvector& src
 PointerType::get(mSimdFP32Ty, 0),   // simdvector& src1
+PointerType::get(mSimdFP32Ty, 0),   // src0alpha
 Type::getInt32Ty(JM()->mContext),   // sampleNum
 PointerType::get(mSimdFP32Ty, 0),   // uint8_t* pDst
 PointerType::get(mSimdFP32Ty, 0),   // simdvector& 
result
@@ -545,6 +549,8 @@ struct BlendJit : public Builder
 pSrc->setName("src");
 Value* pSrc1 = &*argitr++;
 pSrc1->setName("src1");
+Value* pSrc0Alpha = &*argitr++;
+pSrc0Alpha->setName("src0alpha");
 Value* sampleNum = &*argitr++;
 sampleNum->setName("sampleNum");
 Value* pDst = &*argitr++;
@@ -588,7 +594,7 @@ struct BlendJit : public Builder
 // alpha test
 if (state.desc.alphaTestEnable)
 {
-AlphaTest(state, pBlendState, src[3], ppMask);
+AlphaTest(state, pBlendState, pSrc0Alpha, ppMask);
 }

 // color blend
--
2.7.4

___
mesa-dev mailing list
mesa-dev

Re: [Mesa-dev] [PATCH 1/3] swr: add support for upper-left fragcoord position

2016-11-15 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 14, 2016, at 7:03 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Fixes glsl-arb-fragment-coord-conventions.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_shader.cpp | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index f639df3..e4f9796 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -500,8 +500,14 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)
 inputs[attrib][3] = wrap(VIMMED1(1.0f));
 continue;
  } else if (semantic_name == TGSI_SEMANTIC_POSITION) { // gl_FragCoord
- inputs[attrib][0] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vX, 
PixelPositions_center}, "vX"));
- inputs[attrib][1] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vY, 
PixelPositions_center}, "vY"));
+ if (swr_fs->info.base.properties[TGSI_PROPERTY_FS_COORD_PIXEL_CENTER] 
==
+ TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER) {
+inputs[attrib][0] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vX, 
PixelPositions_center}, "vX"));
+inputs[attrib][1] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vY, 
PixelPositions_center}, "vY"));
+ } else {
+inputs[attrib][0] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vX, 
PixelPositions_UL}, "vX"));
+inputs[attrib][1] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vY, 
PixelPositions_UL}, "vY"));
+ }
 inputs[attrib][2] = wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vZ}, "vZ"));
 inputs[attrib][3] =
wrap(LOAD(pPS, {0, SWR_PS_CONTEXT_vOneOverW, 
PixelPositions_center}, "vOneOverW"));
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] swr: always enable adding start/base vertex to gl_VertexId

2016-11-15 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 14, 2016, at 7:03 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Fixes gl-3.2-basevertex-vertexid

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 2c7f3be..8038ef5 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -447,6 +447,7 @@ swr_create_vertex_elements_state(struct pipe_context *pipe,
   assert(num_elements <= PIPE_MAX_ATTRIBS);
   velems = CALLOC_STRUCT(swr_vertex_element_state);
   if (velems) {
+  velems->fsState.bVertexIDOffsetEnable = true;
  velems->fsState.numAttribs = num_elements;
  for (unsigned i = 0; i < num_elements; i++) {
 // XXX: we should do this keyed on the VS usage info
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] swr: mark color clamping as unsupported

2016-11-15 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 14, 2016, at 7:03 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

There is no functionality in swr to clamp either vertex or frag colors.
This could be added in swr_shader, at which point these could be
re-enabled.

Fixes arb_color_buffer_float-render

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index e52f8d2..0b1d61d 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -235,8 +235,9 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_TEXTURE_BARRIER:
  return 0;
   case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
-   case PIPE_CAP_VERTEX_COLOR_UNCLAMPED: /* draw module */
-   case PIPE_CAP_VERTEX_COLOR_CLAMPED: /* draw module */
+   case PIPE_CAP_VERTEX_COLOR_CLAMPED:
+  return 0;
+   case PIPE_CAP_VERTEX_COLOR_UNCLAMPED:
  return 1;
   case PIPE_CAP_MIXED_COLORBUFFER_FORMATS:
  return 1;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] swr: [rasterizer core] clear data now comes in as float

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The non-fast-clear path was never updated after clear colors were passed
in as floats. Remove the now-harmful conversion from unorm8.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/backend.cpp | 14 --
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index 37de650..45eff15 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -285,17 +285,11 @@ void ProcessClearBE(DRAW_CONTEXT *pDC, uint32_t workerId, 
uint32_t macroTile, vo

if (pClear->attachmentMask & SWR_ATTACHMENT_MASK_COLOR)
{
-/// @todo clear data should come in as RGBA32_FLOAT
DWORD clearData[4];
-float clearFloat[4];
-clearFloat[0] = ((uint8_t*)(&pClear->clearRTColor))[0] / 255.0f;
-clearFloat[1] = ((uint8_t*)(&pClear->clearRTColor))[1] / 255.0f;
-clearFloat[2] = ((uint8_t*)(&pClear->clearRTColor))[2] / 255.0f;
-clearFloat[3] = ((uint8_t*)(&pClear->clearRTColor))[3] / 255.0f;
-clearData[0] = *(DWORD*)&clearFloat[0];
-clearData[1] = *(DWORD*)&clearFloat[1];
-clearData[2] = *(DWORD*)&clearFloat[2];
-clearData[3] = *(DWORD*)&clearFloat[3];
+clearData[0] = *(DWORD*)&(pClear->clearRTColor[0]);
+clearData[1] = *(DWORD*)&(pClear->clearRTColor[1]);
+clearData[2] = *(DWORD*)&(pClear->clearRTColor[2]);
+clearData[3] = *(DWORD*)&(pClear->clearRTColor[3]);

PFN_CLEAR_TILES pfnClearTiles = 
sClearTilesTable[KNOB_COLOR_HOT_TILE_FORMAT];
SWR_ASSERT(pfnClearTiles != nullptr);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/6] swr: [rasterizer memory] minify texture width before alignment

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 17, 2016, at 10:56 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The minification should happen before alignment, not after. See similar
logic on ComputeLODOffsetY. The current logic requires unnecessarily
large textures when there's an initial NPOT size.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h 
b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
index 11ed451..350e44b 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
@@ -284,8 +284,8 @@ INLINE void ComputeLODOffset1D(
offset = GFX_ALIGN(curWidth, hAlign);
for (uint32_t l = 1; l < lod; ++l)
{
-curWidth = GFX_ALIGN(std::max(curWidth >> 1, 1U), 
hAlign);
-offset += curWidth;
+curWidth = std::max(curWidth >> 1, 1U);
+offset += GFX_ALIGN(curWidth, hAlign);
}

if (info.isSubsampled || info.isBC)
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/6] swr: [rasterizer memory] minify original sizes for block formats

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 17, 2016, at 10:56 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

There's no guarantee that mip width/height will be a multiple of the
compressed block size. Doing a divide by the block size first yields
different results than GL expects, so we do the divide at the end.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
.../swr/rasterizer/memory/TilingFunctions.h| 36 +++---
1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h 
b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
index 0694a99..11ed451 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/TilingFunctions.h
@@ -274,9 +274,12 @@ INLINE void ComputeLODOffset1D(
else
{
uint32_t curWidth = baseWidth;
-// translate mip width from pixels to blocks for block compressed 
formats
-// @note hAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) curWidth /= info.bcWidth;
+// @note hAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+hAlign *= info.bcWidth;
+}

offset = GFX_ALIGN(curWidth, hAlign);
for (uint32_t l = 1; l < lod; ++l)
@@ -285,7 +288,7 @@ INLINE void ComputeLODOffset1D(
offset += curWidth;
}

-if (info.isSubsampled)
+if (info.isSubsampled || info.isBC)
{
offset /= info.bcWidth;
}
@@ -312,14 +315,17 @@ INLINE void ComputeLODOffsetX(
else
{
uint32_t curWidth = baseWidth;
-// convert mip width from pixels to blocks for block compressed formats
-// @note hAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) curWidth /= info.bcWidth;
+// @note hAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+hAlign *= info.bcWidth;
+}

curWidth = std::max(curWidth >> 1, 1U);
curWidth = GFX_ALIGN(curWidth, hAlign);

-if (info.isSubsampled)
+if (info.isSubsampled || info.isBC)
{
curWidth /= info.bcWidth;
}
@@ -350,9 +356,12 @@ INLINE void ComputeLODOffsetY(
offset = 0;
uint32_t mipHeight = baseHeight;

-// translate mip height from pixels to blocks for block compressed 
formats
-// @note VAlign is already in blocks for compressed formats so no need 
to convert
-if (info.isBC) mipHeight /= info.bcHeight;
+// @note vAlign is already in blocks for compressed formats so 
upconvert
+//   so that we have the desired alignment post-divide.
+if (info.isBC)
+{
+vAlign *= info.bcHeight;
+}

for (uint32_t l = 1; l <= lod; ++l)
{
@@ -360,6 +369,11 @@ INLINE void ComputeLODOffsetY(
offset += ((l != 2) ? alignedMipHeight : 0);
mipHeight = std::max(mipHeight >> 1, 1U);
}
+
+if (info.isBC)
+{
+offset /= info.bcHeight;
+}
}
}

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: report a reasonable max lod bias

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 19, 2016, at 10:11 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

This is the same value that llvmpipe uses. Since swr uses the same
sampler logic, makes sense for this value to also be the same. Most
applications don't care.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

I kind of assume this is dependent on my layout patches since LODs weren't
always properly handled before.

src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 36afcc3..9affa02 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -406,7 +406,7 @@ swr_get_paramf(struct pipe_screen *screen, enum pipe_capf 
param)
   case PIPE_CAPF_MAX_TEXTURE_ANISOTROPY:
  return 0.0;
   case PIPE_CAPF_MAX_TEXTURE_LOD_BIAS:
-  return 0.0;
+  return 16.0; /* arbitrary */
   case PIPE_CAPF_GUARD_BAND_LEFT:
   case PIPE_CAPF_GUARD_BAND_TOP:
   case PIPE_CAPF_GUARD_BAND_RIGHT:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] swr: calculate viewport width/height based on the scale

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 20, 2016, at 10:32 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The former calculations were for min/max y. The width/height don't take
translate into account.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

v2 -> v3:
- reduce viewport width when clamping the x/y offsets to 0
- subtract vp->y from height, not vp->x

Let's hope I don't need to write a v4 of this trivial patch.

src/gallium/drivers/swr/swr_state.cpp | 18 --
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 520faea..0302439 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1018,9 +1018,9 @@ swr_update_derived(struct pipe_context *pipe,
  SWR_VIEWPORT_MATRICES *vpm = &ctx->derived.vpm;

  vp->x = state->translate[0] - state->scale[0];
-  vp->width = state->translate[0] + state->scale[0];
+  vp->width = 2 * state->scale[0];
  vp->y = state->translate[1] - fabs(state->scale[1]);
-  vp->height = state->translate[1] + fabs(state->scale[1]);
+  vp->height = 2 * fabs(state->scale[1]);
  util_viewport_zmin_zmax(state, rasterizer->clip_halfz,
  &vp->minZ, &vp->maxZ);

@@ -1033,10 +1033,16 @@ swr_update_derived(struct pipe_context *pipe,

  /* Now that the matrix is calculated, clip the view coords to screen
   * size.  OpenGL allows for -ve x,y in the viewport. */
-  vp->x = std::max(vp->x, 0.0f);
-  vp->y = std::max(vp->y, 0.0f);
-  vp->width = std::min(vp->width, (float)fb->width);
-  vp->height = std::min(vp->height, (float)fb->height);
+  if (vp->x < 0.0f) {
+ vp->width += vp->x;
+ vp->x = 0.0f;
+  }
+  if (vp->y < 0.0f) {
+ vp->height += vp->y;
+ vp->y = 0.0f;
+  }
+  vp->width = std::min(vp->width, (float)fb->width - vp->x);
+  vp->height = std::min(vp->height, (float)fb->height - vp->y);

  SwrSetViewports(ctx->swrContext, 1, vp, vpm);
   }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: don't claim to allow setting layer/viewport from VS

2016-11-21 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 20, 2016, at 12:20 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

This may ultimately be possible to support, but for now it's not hooked
up and the swr core only supports this output from GS.

This normally wouldn't matter, but we lie about supporting GL 3.2, and
also the blitter and st/mesa will make use of this functionality if
claimed.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 9affa02..bbecee5 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -252,10 +252,10 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_USER_CONSTANT_BUFFERS:
   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
   case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
-   case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
  return 1;
   case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
  return 16;
+   case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
   case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
   case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] swr: only broadcast color0 value, not all color values

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The way that dual-source blending is described for GLES2 is very odd,
and we end up with a shader that both has this property set *and* has a
color1 value to be used as the second source. While changing the state
tracker is an option, it seems more reliable to verify that the
broadcast is only done on color0.

Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_shader.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index e4f9796..2f72239 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -645,7 +645,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)

LLVMValueRef out =
   LLVMBuildLoad(gallivm->builder, outputs[attrib][channel], "");
-if 
(swr_fs->info.base.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS]) {
+if 
(swr_fs->info.base.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS] &&
+swr_fs->info.base.output_semantic_index[attrib] == 0) {
   for (uint32_t rt = 0; rt < key.nr_cbufs; rt++) {
  STORE(unwrap(out),
pPS,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] swr: flatshading makes color outputs flat, it doesn't affect others

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

We were previously not marking the "regular" flat outputs as flat when
flatshading was enabled.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.cpp | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index dcbe434..8541aca 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1490,10 +1490,8 @@ swr_update_derived(struct pipe_context *pipe,
  (ctx->rasterizer->sprite_coord_enable ? 1 : 0);
   for (unsigned i = 0; i < backendState.numAttributes; i++)
  backendState.numComponents[i] = 4;
-   backendState.constantInterpolationMask =
-  ctx->rasterizer->flatshade ?
-  ctx->fs->flatConstantMask :
-  ctx->fs->constantMask;
+   backendState.constantInterpolationMask = ctx->fs->constantMask |
+  (ctx->rasterizer->flatshade ? ctx->fs->flatConstantMask : 0);
   backendState.pointSpriteTexCoordMask = ctx->fs->pointSpriteMask;

   SwrSetBackendState(ctx->swrContext, &backendState);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] swr: rework vert <-> frag shader linkage logic

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Fixes a few things:
- sprite coords only apply to generic varyings, and are a bitmask
- back color only applies in 2-sided lighting mode
- handle some odd situations between only some front/back colors being
  there. This is only semi-legal in GL, but we shouldn't start
  crashing.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_shader.cpp | 93 ++
1 file changed, 50 insertions(+), 43 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index 2f72239..d29f635 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -372,15 +372,6 @@ locate_linkage(ubyte name, ubyte index, struct 
tgsi_shader_info *info)
  }
   }

-   if (name == TGSI_SEMANTIC_COLOR) { // BCOLOR fallback
-  for (int i = 0; i < PIPE_MAX_SHADER_OUTPUTS; i++) {
- if ((info->output_semantic_name[i] == TGSI_SEMANTIC_BCOLOR)
- && (info->output_semantic_index[i] == index)) {
-return i - 1; // position is not part of the linkage
- }
-  }
-   }
-
   return 0x;
}

@@ -523,54 +514,70 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)

  unsigned linkedAttrib =
 locate_linkage(semantic_name, semantic_idx, &ctx->vs->info.base);
-  if (linkedAttrib == 0x) {
- // not found - check for point sprite
- if (ctx->rasterizer->sprite_coord_enable) {
-linkedAttrib = ctx->vs->info.base.num_outputs - 1;
-swr_fs->pointSpriteMask |= (1 << linkedAttrib);
- } else {
-fprintf(stderr,
-"Missing %s[%d]\n",
-tgsi_semantic_names[semantic_name],
-semantic_idx);
-assert(0 && "attribute linkage not found");
+  if (semantic_name == TGSI_SEMANTIC_GENERIC &&
+  ctx->rasterizer->sprite_coord_enable & (1 << semantic_idx)) {
+ /* we add an extra attrib to the backendState in swr_update_derived. 
*/
+ linkedAttrib = ctx->vs->info.base.num_outputs - 1;
+ swr_fs->pointSpriteMask |= (1 << linkedAttrib);
+  } else if (linkedAttrib == 0x) {
+ inputs[attrib][0] = wrap(VIMMED1(0.0f));
+ inputs[attrib][1] = wrap(VIMMED1(0.0f));
+ inputs[attrib][2] = wrap(VIMMED1(0.0f));
+ inputs[attrib][3] = wrap(VIMMED1(1.0f));
+ /* If we're reading in color and 2-sided lighting is enabled, we have
+  * to keep going.
+  */
+ if (semantic_name != TGSI_SEMANTIC_COLOR || !key.light_twoside)
+continue;
+  } else {
+ if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
+swr_fs->constantMask |= 1 << linkedAttrib;
+ } else if (interpMode == TGSI_INTERPOLATE_COLOR) {
+swr_fs->flatConstantMask |= 1 << linkedAttrib;
 }
  }

-  if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
- swr_fs->constantMask |= 1 << linkedAttrib;
-  } else if (interpMode == TGSI_INTERPOLATE_COLOR) {
- swr_fs->flatConstantMask |= 1 << linkedAttrib;
-  }
-
-  for (int channel = 0; channel < TGSI_NUM_CHANNELS; channel++) {
- if (mask & (1 << channel)) {
-Value *indexA = C(linkedAttrib * 12 + channel);
-Value *indexB = C(linkedAttrib * 12 + channel + 4);
-Value *indexC = C(linkedAttrib * 12 + channel + 8);
+  unsigned bcolorAttrib = 0x;
+  Value *offset = NULL;
+  if (semantic_name == TGSI_SEMANTIC_COLOR && key.light_twoside) {
+ bcolorAttrib = locate_linkage(
+   TGSI_SEMANTIC_BCOLOR, semantic_idx, &ctx->vs->info.base);
+ /* Neither front nor back colors were available. Nothing to load. */
+ if (bcolorAttrib == 0x && linkedAttrib == 0x)
+continue;
+ /* If there is no front color, just always use the back color. */
+ if (linkedAttrib == 0x)
+linkedAttrib = bcolorAttrib;

-if ((semantic_name == TGSI_SEMANTIC_COLOR)
-&& ctx->rasterizer->light_twoside) {
-   unsigned bcolorAttrib = locate_linkage(
-  TGSI_SEMANTIC_BCOLOR, semantic_idx, &ctx->vs->info.base);
+ if (bcolorAttrib != 0x) {
+if (interpMode == TGSI_INTERPOLATE_CONSTANT) {
+   swr_fs->constantMask |= 1 << bcolorAttrib;
+} else if (interpMode == TGSI_INTERPOLATE_COLOR) {
+   swr_fs->flatConstantMask |= 1 << bcolorAttrib;
+}

-   unsigned diff = 12 * (bcolorAttrib - linkedAttrib);
+unsigned diff = 12 * (bcolorAttrib - linkedAttrib);

+if (diff) {
   Value *back =
  XOR(C(1), LOAD(pPS, {0, SWR_PS_CONT

Re: [Mesa-dev] [PATCH 4/5] swr: add sprite coord enable mask to fs key

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_shader.cpp | 3 ++-
src/gallium/drivers/swr/swr_shader.h   | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index d29f635..428c9b3 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -131,6 +131,7 @@ swr_generate_fs_key(struct swr_jit_fs_key &key,

   key.nr_cbufs = ctx->framebuffer.nr_cbufs;
   key.light_twoside = ctx->rasterizer->light_twoside;
+   key.sprite_coord_enable = ctx->rasterizer->sprite_coord_enable;
   memcpy(&key.vs_output_semantic_name,
  &ctx->vs->info.base.output_semantic_name,
  sizeof(key.vs_output_semantic_name));
@@ -515,7 +516,7 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)
  unsigned linkedAttrib =
 locate_linkage(semantic_name, semantic_idx, &ctx->vs->info.base);
  if (semantic_name == TGSI_SEMANTIC_GENERIC &&
-  ctx->rasterizer->sprite_coord_enable & (1 << semantic_idx)) {
+  key.sprite_coord_enable & (1 << semantic_idx)) {
 /* we add an extra attrib to the backendState in swr_update_derived. */
 linkedAttrib = ctx->vs->info.base.num_outputs - 1;
 swr_fs->pointSpriteMask |= (1 << linkedAttrib);
diff --git a/src/gallium/drivers/swr/swr_shader.h 
b/src/gallium/drivers/swr/swr_shader.h
index ccdda44..7e3399c 100644
--- a/src/gallium/drivers/swr/swr_shader.h
+++ b/src/gallium/drivers/swr/swr_shader.h
@@ -51,6 +51,7 @@ struct swr_jit_sampler_key {
struct swr_jit_fs_key : swr_jit_sampler_key {
   unsigned nr_cbufs;
   unsigned light_twoside;
+   unsigned sprite_coord_enable;
   ubyte vs_output_semantic_name[PIPE_MAX_SHADER_OUTPUTS];
   ubyte vs_output_semantic_idx[PIPE_MAX_SHADER_OUTPUTS];
};
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] swr: color interpolation is also supposed to get perspective division

2016-11-22 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 21, 2016, at 11:52 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_shader.cpp | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
b/src/gallium/drivers/swr/swr_shader.cpp
index 428c9b3..294a568 100644
--- a/src/gallium/drivers/swr/swr_shader.cpp
+++ b/src/gallium/drivers/swr/swr_shader.cpp
@@ -457,7 +457,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)

  // load/compute w
  Value *vw = nullptr, *pAttribs;
-  if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE) {
+  if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE ||
+  interpMode == TGSI_INTERPOLATE_COLOR) {
 pAttribs = pPerspAttribs;
 switch (interpLoc) {
 case TGSI_INTERPOLATE_LOC_CENTER:
@@ -596,7 +597,8 @@ BuilderSWR::CompileFS(struct swr_context *ctx, 
swr_jit_fs_key &key)
   Value *interp1 = FMUL(vb, vj);
   interp = FADD(interp, interp1);
   interp = FADD(interp, vc);
-   if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE)
+   if (interpMode == TGSI_INTERPOLATE_PERSPECTIVE ||
+   interpMode == TGSI_INTERPOLATE_COLOR)
  interp = FMUL(interp, vw);
   inputs[attrib][channel] = wrap(interp);
}
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] swr: [rasterizer core] pipe renderTargetArrayIndex through to clears

2016-11-23 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Currently clears only operate on the 0th array index (ignoring surface
layout parameters). Instead normalize to take a RTAI like all the
load/store tile logic does, and use ComputeSurfaceAddress to properly
take the surface state's lod/array index into account.
---
src/gallium/drivers/swr/rasterizer/core/api.cpp  |  3 +++
src/gallium/drivers/swr/rasterizer/core/api.h|  5 -
src/gallium/drivers/swr/rasterizer/core/backend.cpp  | 20 ++--
src/gallium/drivers/swr/rasterizer/core/context.h|  1 +
.../drivers/swr/rasterizer/memory/ClearTile.cpp  | 20 +---
src/gallium/drivers/swr/swr_clear.cpp|  2 +-
src/gallium/drivers/swr/swr_memory.h |  4 +++-
7 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 6ade65a..383a7ad 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -1476,6 +1476,7 @@ void SWR_API SwrStoreTiles(
/// @brief SwrClearRenderTarget - Clear attached render targets / depth / 
stencil
/// @param hContext - Handle passed back from SwrCreateContext
/// @param attachmentMask - combination of SWR_ATTACHMENT_*_BIT attachments to 
clear
+/// @param renderTargetArrayIndex - the RT array index to clear
/// @param clearColor - color use for clearing render targets
/// @param z - depth value use for clearing depth buffer
/// @param stencil - stencil value used for clearing stencil buffer
@@ -1483,6 +1484,7 @@ void SWR_API SwrStoreTiles(
void SWR_API SwrClearRenderTarget(
HANDLE hContext,
uint32_t attachmentMask,
+uint32_t renderTargetArrayIndex,
const float clearColor[4],
float z,
uint8_t stencil,
@@ -1503,6 +1505,7 @@ void SWR_API SwrClearRenderTarget(
pDC->FeWork.desc.clear.rect = clearRect;
pDC->FeWork.desc.clear.rect &= g_MaxScissorRect;
pDC->FeWork.desc.clear.attachmentMask = attachmentMask;
+pDC->FeWork.desc.clear.renderTargetArrayIndex = renderTargetArrayIndex;
pDC->FeWork.desc.clear.clearDepth = z;
pDC->FeWork.desc.clear.clearRTColor[0] = clearColor[0];
pDC->FeWork.desc.clear.clearRTColor[1] = clearColor[1];
diff --git a/src/gallium/drivers/swr/rasterizer/core/api.h 
b/src/gallium/drivers/swr/rasterizer/core/api.h
index 1a41637..d0f29dd 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.h
+++ b/src/gallium/drivers/swr/rasterizer/core/api.h
@@ -137,10 +137,11 @@ typedef void(SWR_API *PFN_STORE_TILE)(HANDLE 
hPrivateContext, SWR_FORMAT srcForm
/// @param renderTargetIndex - render target to store, can be color, depth or 
stencil
/// @param x - destination x coordinate
/// @param y - destination y coordinate
+/// @param renderTargetArrayIndex - render target array offset from arrayIndex
/// @param pClearColor - pointer to the hot tile's clear value
typedef void(SWR_API *PFN_CLEAR_TILE)(HANDLE hPrivateContext,
SWR_RENDERTARGET_ATTACHMENT rtIndex,
-uint32_t x, uint32_t y, const float* pClearColor);
+uint32_t x, uint32_t y, uint32_t renderTargetArrayIndex, const float* 
pClearColor);

//
/// @brief Callback to allow driver to update their copy of streamout write 
offset.
@@ -559,6 +560,7 @@ void SWR_API SwrStoreTiles(
/// @brief SwrClearRenderTarget - Clear attached render targets / depth / 
stencil
/// @param hContext - Handle passed back from SwrCreateContext
/// @param attachmentMask - combination of SWR_ATTACHMENT_*_BIT attachments to 
clear
+/// @param renderTargetArrayIndex - the RT array index to clear
/// @param clearColor - color use for clearing render targets
/// @param z - depth value use for clearing depth buffer
/// @param stencil - stencil value used for clearing stencil buffer
@@ -566,6 +568,7 @@ void SWR_API SwrStoreTiles(
void SWR_API SwrClearRenderTarget(
HANDLE hContext,
uint32_t attachmentMask,
+uint32_t renderTargetArrayIndex,
const float clearColor[4],
float z,
uint8_t stencil,
diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index 45eff15..c45c0a7 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -37,7 +37,7 @@

#include 

-typedef void(*PFN_CLEAR_TILES)(DRAW_CONTEXT*, SWR_RENDERTARGET_ATTACHMENT rt, 
uint32_t, DWORD[4], const SWR_RECT& rect);
+typedef void(*PFN_CLEAR_TILES)(DRAW_CONTEXT*, SWR_RENDERTARGET_ATTACHMENT rt, 
uint32_t, uint32_t, DWORD[4], const SWR_RECT& rect);
static PFN_CLEAR_TILES sClearTilesTable[NUM_SWR_FORMATS];

//
@@ -134,7 +134,7 @@ void ClearRasterTile(uint8_t *pTileBuffer, simd16vecto

Re: [Mesa-dev] [PATCH 1/4] swr: [rasterizer core] actually perform clear before store in GetHotTile

2016-11-23 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

When switching render target array indexes (as might happen in a GS, or
in a future change, with layered clears), if the previous state is
HOTTILE_CLEAR, we should actually clear the tile before saving it off.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp | 12 
1 file changed, 12 insertions(+)

diff --git a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp 
b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
index 804fc4f..f398667 100644
--- a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
@@ -149,6 +149,18 @@ HOTTILE* HotTileMgr::GetHotTile(SWR_CONTEXT* pContext, 
DRAW_CONTEXT* pDC, uint32
default: SWR_ASSERT(false, "Unknown attachment: %d", attachment); 
format = KNOB_COLOR_HOT_TILE_FORMAT; break;
}

+if (hotTile.state == HOTTILE_CLEAR)
+{
+if (attachment == SWR_ATTACHMENT_STENCIL)
+ClearStencilHotTile(&hotTile);
+else if (attachment == SWR_ATTACHMENT_DEPTH)
+ClearDepthHotTile(&hotTile);
+else
+ClearColorHotTile(&hotTile);
+
+hotTile.state = HOTTILE_DIRTY;
+}
+
if (hotTile.state == HOTTILE_DIRTY)
{
pContext->pfnStoreTile(GetPrivateState(pDC), format, attachment,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: clear every layer of the attached surfaces

2016-11-23 Thread Rowley, Timothy O
This code seems to assume that all attached buffers have the same start layer, 
and that start will be zero.  Maybe it should construct the clearMask inside 
the layer loop, which would also be a bit clearer than the code you added to 
drop bits out of the mask?

-Tim

> On Nov 17, 2016, at 6:51 PM, Ilia Mirkin  wrote:
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> With this patch, the layered-rendering clear tests pass, both with fast clear
> enabled and disabled.
> 
> src/gallium/drivers/swr/swr_clear.cpp | 35 +--
> 1 file changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
> b/src/gallium/drivers/swr/swr_clear.cpp
> index 25f066e..7ac308e 100644
> --- a/src/gallium/drivers/swr/swr_clear.cpp
> +++ b/src/gallium/drivers/swr/swr_clear.cpp
> @@ -35,6 +35,7 @@ swr_clear(struct pipe_context *pipe,
>struct pipe_framebuffer_state *fb = &ctx->framebuffer;
> 
>UINT clearMask = 0;
> +   int layers = 0;
> 
>if (!swr_check_render_cond(pipe))
>   return;
> @@ -44,24 +45,46 @@ swr_clear(struct pipe_context *pipe,
> 
>if (buffers & PIPE_CLEAR_COLOR && fb->nr_cbufs) {
>   for (unsigned i = 0; i < fb->nr_cbufs; ++i)
> - if (fb->cbufs[i])
> + if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
> clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
> +layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
> +  fb->cbufs[i]->u.tex.first_layer + 1);
> + }
>}
> 
> -   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf)
> +   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
>   clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
> +  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
> +fb->zsbuf->u.tex.first_layer + 1);
> +   }
> 
> -   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf)
> +   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
>   clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
> +  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
> +fb->zsbuf->u.tex.first_layer + 1);
> +   }
> 
> #if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
>   // transparent.
>((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your 
> const'd-ness */
> #endif
> 
> -   swr_update_draw_context(ctx);
> -   SwrClearRenderTarget(ctx->swrContext, clearMask, 0, color->f, depth, 
> stencil,
> -ctx->swr_scissor);
> +   for (int i = 0; i < layers; ++i) {
> +  swr_update_draw_context(ctx);
> +  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
> +   color->f, depth, stencil,
> +   ctx->swr_scissor);
> +
> +  // Mask out the attachments that are out of layers.
> +  if (fb->zsbuf &&
> +  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
> + clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | 
> SWR_ATTACHMENT_STENCIL_BIT);
> +  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
> + const struct pipe_surface *sf = fb->cbufs[c];
> + if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
> +clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
> +  }
> +   }
> }
> 
> 
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: clear every layer of the attached surfaces

2016-11-23 Thread Rowley, Timothy O
Ah, didn’t notice that they were all shifted by arrayIndex.  Fine to leave the 
changes as they are, then.

This series of four patches (or rather, the rebased versions in your repo) are 
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>


On Nov 23, 2016, at 2:11 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

On Wed, Nov 23, 2016 at 3:02 PM, Rowley, Timothy O
mailto:timothy.o.row...@intel.com>> wrote:
This code seems to assume that all attached buffers have the same start layer, 
and that start will be zero.  Maybe it should construct the clearMask inside 
the layer loop, which would also be a bit clearer than the code you added to 
drop bits out of the mask?

They have a logical start layer which is the same (0), since the real
start layer is in the SWR_SURFACE_STATE's arrayIndex. The arrayIndex
is added to the renderTargetArrayIndex to compute a final layer to
operate on.

If you'd like to simplify this code, I could just clear every
attachment/layer one at a time rather than trying to do it in fewer
steps. I suspect that the end effect on the swr backend will be
largely identical.

 -ilia


-Tim

On Nov 17, 2016, at 6:51 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

With this patch, the layered-rendering clear tests pass, both with fast clear
enabled and disabled.

src/gallium/drivers/swr/swr_clear.cpp | 35 +--
1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index 25f066e..7ac308e 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -35,6 +35,7 @@ swr_clear(struct pipe_context *pipe,
  struct pipe_framebuffer_state *fb = &ctx->framebuffer;

  UINT clearMask = 0;
+   int layers = 0;

  if (!swr_check_render_cond(pipe))
 return;
@@ -44,24 +45,46 @@ swr_clear(struct pipe_context *pipe,

  if (buffers & PIPE_CLEAR_COLOR && fb->nr_cbufs) {
 for (unsigned i = 0; i < fb->nr_cbufs; ++i)
- if (fb->cbufs[i])
+ if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
   clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
+layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
+  fb->cbufs[i]->u.tex.first_layer + 1);
+ }
  }

-   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf)
+   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
 clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
+  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
+fb->zsbuf->u.tex.first_layer + 1);
+   }

-   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf)
+   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
 clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
+  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
+fb->zsbuf->u.tex.first_layer + 1);
+   }

#if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
 // transparent.
  ((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your const'd-ness 
*/
#endif

-   swr_update_draw_context(ctx);
-   SwrClearRenderTarget(ctx->swrContext, clearMask, 0, color->f, depth, 
stencil,
-ctx->swr_scissor);
+   for (int i = 0; i < layers; ++i) {
+  swr_update_draw_context(ctx);
+  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
+   color->f, depth, stencil,
+   ctx->swr_scissor);
+
+  // Mask out the attachments that are out of layers.
+  if (fb->zsbuf &&
+  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
+ clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | SWR_ATTACHMENT_STENCIL_BIT);
+  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
+ const struct pipe_surface *sf = fb->cbufs[c];
+ if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
+clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
+  }
+   }
}


--
2.7.3



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] fix typo in scissor tile-alignment logic

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 25, 2016, at 7:35 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/api.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
b/src/gallium/drivers/swr/rasterizer/core/api.cpp
index 383a7ad..6c0d5dd 100644
--- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
@@ -765,7 +765,7 @@ void SetupMacroTileScissors(DRAW_CONTEXT *pDC)
tileAligned  = (scissorInFixedPoint.xmin % KNOB_TILE_X_DIM) == 0;
tileAligned &= (scissorInFixedPoint.ymin % KNOB_TILE_Y_DIM) == 0;
tileAligned &= (scissorInFixedPoint.xmax % KNOB_TILE_X_DIM) == 0;
-tileAligned &= (scissorInFixedPoint.xmax % KNOB_TILE_Y_DIM) == 0;
+tileAligned &= (scissorInFixedPoint.ymax % KNOB_TILE_Y_DIM) == 0;

pState->scissorsTileAligned &= tileAligned;

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] swr: [rasterizer memory] add support for clearing Z32F_X32 and Z16

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 2 ++
1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 717d12c..8501e21 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -282,7 +282,9 @@ void StoreHotTileClear(
memset(sStoreTilesClearDepthTable, 0, sizeof(sStoreTilesClearDepthTable)); \
\
sStoreTilesClearDepthTable[R32_FLOAT] = StoreMacroTileClear::StoreClear; \
+sStoreTilesClearDepthTable[R32_FLOAT_X8X24_TYPELESS] = 
StoreMacroTileClear::StoreClear; \
sStoreTilesClearDepthTable[R24_UNORM_X8_TYPELESS] = 
StoreMacroTileClear::StoreClear; \
+sStoreTilesClearDepthTable[R16_UNORM] = StoreMacroTileClear::StoreClear; \

//
/// @brief Sets up tables for ClearTile
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] swr: [rasterizer memory] hook up stencil clears for ClearTile

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 8501e21..31a40a3 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -156,16 +156,19 @@ void StoreHotTileClear(
{
PFN_STORE_TILES_CLEAR pfnStoreTilesClear = NULL;

-SWR_ASSERT(renderTargetIndex != SWR_ATTACHMENT_STENCIL);  ///@todo Not 
supported yet.
-
-if (renderTargetIndex != SWR_ATTACHMENT_DEPTH)
+if (renderTargetIndex == SWR_ATTACHMENT_STENCIL)
{
-pfnStoreTilesClear = sStoreTilesClearColorTable[pDstSurface->format];
+SWR_ASSERT(pDstSurface->format == R8_UINT);
+pfnStoreTilesClear = StoreMacroTileClear::StoreClear;
}
-else
+else if (renderTargetIndex == SWR_ATTACHMENT_DEPTH)
{
pfnStoreTilesClear = sStoreTilesClearDepthTable[pDstSurface->format];
}
+else
+{
+pfnStoreTilesClear = sStoreTilesClearColorTable[pDstSurface->format];
+}

SWR_ASSERT(pfnStoreTilesClear != NULL);

--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] swr: [rasterizer memory] only clear up to the LOD size

2016-11-28 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 19, 2016, at 9:48 AM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp | 10 --
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp 
b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
index 31a40a3..ee13f55 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
+++ b/src/gallium/drivers/swr/rasterizer/memory/ClearTile.cpp
@@ -60,6 +60,12 @@ struct StoreRasterTileClear
UINT x, UINT y, // (x, y) pixel coordinate to start of raster tile.
uint32_t renderTargetArrayIndex)
{
+// If we're outside of the surface, stop.
+uint32_t lodWidth = std::max(pDstSurface->width >> 
pDstSurface->lod, 1U);
+uint32_t lodHeight = std::max(pDstSurface->height >> 
pDstSurface->lod, 1U);
+if (x >= lodWidth || y >= lodHeight)
+return;
+
// Compute destination address for raster tile.
uint8_t* pDstTile = (uint8_t*)ComputeSurfaceAddress(
x, y, pDstSurface->arrayIndex + renderTargetArrayIndex,
@@ -73,7 +79,7 @@ struct StoreRasterTileClear
UINT dstBytesPerRow = 0;

// For each raster tile pixel in row 0 (rx, 0)
-for (UINT rx = 0; (rx < KNOB_TILE_X_DIM) && ((x + rx) < 
pDstSurface->width); ++rx)
+for (UINT rx = 0; (rx < KNOB_TILE_X_DIM) && ((x + rx) < lodWidth); 
++rx)
{
memcpy(pDst, dstFormattedColor, dstBytesPerPixel);

@@ -86,7 +92,7 @@ struct StoreRasterTileClear
pDst = pDstTile + pDstSurface->pitch;

// For each remaining row in the rest of the raster tile
-for (UINT ry = 1; (ry < KNOB_TILE_Y_DIM) && ((y + ry) < 
pDstSurface->height); ++ry)
+for (UINT ry = 1; (ry < KNOB_TILE_Y_DIM) && ((y + ry) < lodHeight); 
++ry)
{
// copy row
memcpy(pDst, pDstTile, dstBytesPerRow);
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] swr: [rasterizer core] use ClearTile helper to store fast clears

2016-11-28 Thread Rowley, Timothy O
This patch is showing some regressions on internal testing.  As we talked about 
on irc, it appears to be a combination of crashes (probably missing table 
entries) and possibly wrong clear values.  Will need to back to you later about 
the errors, but for now we need to hold off on this patch.

-Tim

> On Nov 19, 2016, at 9:48 AM, Ilia Mirkin  wrote:
> 
> No point in clearing the hot tile and then storing that - may as well
> just store the clear color to the surface directly. Use the helper that
> already exists for this purpose.
> 
> Signed-off-by: Ilia Mirkin 
> ---
> 
> My theory is that this is going to be a very modest perf improvement. Instead
> of first clearing the hot tile and then storing it, we store the clear color
> directly.
> 
> It does bring up a rare case where a tile might be cleared, stored, and then
> re-used with the same buffer. In that case, the former logic would avoid the
> load while the new logic will end up reloading the clear color/etc. There was
> a grand total of one piglit that was hit by this:
> 
>  fbo-attachments-blit-scaled-linear
> 
> (and that is the reason that we have to set the hottile to INVALID rather than
> the post state when storing.)
> 
> src/gallium/drivers/swr/rasterizer/core/backend.cpp | 17 ++---
> src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp | 15 +--
> 2 files changed, 15 insertions(+), 17 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> index c45c0a7..ff08233 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
> @@ -358,16 +358,19 @@ void ProcessStoreTileBE(DRAW_CONTEXT *pDC, uint32_t 
> workerId, uint32_t macroTile
> HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTileNoLoad(pContext, 
> pDC, macroTile, attachment, false);
> if (pHotTile)
> {
> -// clear if clear is pending (i.e., not rendered to), then mark as 
> dirty for store.
> +// clear the surface directly
> if (pHotTile->state == HOTTILE_CLEAR)
> {
> -PFN_CLEAR_TILES pfnClearTiles = sClearTilesTable[srcFormat];
> -SWR_ASSERT(pfnClearTiles != nullptr);
> -
> -pfnClearTiles(pDC, attachment, macroTile, 
> pHotTile->renderTargetArrayIndex, pHotTile->clearData, pDesc->rect);
> +pContext->pfnClearTile(GetPrivateState(pDC), attachment,
> +x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM,
> +pHotTile->renderTargetArrayIndex,
> +(const float *)pHotTile->clearData);
> +
> +// Since the state is effectively uninitialized, make sure that 
> we
> +// reload any data.
> +pHotTile->state = HOTTILE_INVALID;
> }
> -
> -if (pHotTile->state == HOTTILE_DIRTY || pDesc->postStoreTileState == 
> (SWR_TILE_STATE)HOTTILE_DIRTY)
> +else if (pHotTile->state == HOTTILE_DIRTY || 
> pDesc->postStoreTileState == (SWR_TILE_STATE)HOTTILE_DIRTY)
> {
> int32_t destX = KNOB_MACROTILE_X_DIM * x;
> int32_t destY = KNOB_MACROTILE_Y_DIM * y;
> diff --git a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> index f398667..a4a6152 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/tilemgr.cpp
> @@ -151,17 +151,12 @@ HOTTILE* HotTileMgr::GetHotTile(SWR_CONTEXT* pContext, 
> DRAW_CONTEXT* pDC, uint32
> 
> if (hotTile.state == HOTTILE_CLEAR)
> {
> -if (attachment == SWR_ATTACHMENT_STENCIL)
> -ClearStencilHotTile(&hotTile);
> -else if (attachment == SWR_ATTACHMENT_DEPTH)
> -ClearDepthHotTile(&hotTile);
> -else
> -ClearColorHotTile(&hotTile);
> -
> -hotTile.state = HOTTILE_DIRTY;
> +pContext->pfnClearTile(GetPrivateState(pDC), attachment,
> +x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM,
> +hotTile.renderTargetArrayIndex,
> +(const float *)hotTile.clearData);
> }
> -
> -if (hotTile.state == HOTTILE_DIRTY)
> +else if (hotTile.state == HOTTILE_DIRTY)
> {
> pContext->pfnStoreTile(GetPrivateState(pDC), format, 
> attachment,
> x * KNOB_MACROTILE_X_DIM, y * KNOB_MACROTILE_Y_DIM, 
> hotTile.renderTargetArrayIndex, hotTile.pBuffer);
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] swr: only store up to the LOD size

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_draw.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
b/src/gallium/drivers/swr/swr_draw.cpp
index e8c5b23..c4d5e5c 100644
--- a/src/gallium/drivers/swr/swr_draw.cpp
+++ b/src/gallium/drivers/swr/swr_draw.cpp
@@ -259,7 +259,9 @@ swr_store_render_target(struct pipe_context *pipe,
   if (renderTarget->pBaseAddress) {
  swr_update_draw_context(ctx);
  SWR_RECT full_rect =
- {0, 0, (int32_t)renderTarget->width, (int32_t)renderTarget->height};
+ {0, 0,
+  (int32_t)u_minify(renderTarget->width, renderTarget->lod),
+  (int32_t)u_minify(renderTarget->height, renderTarget->lod)};
  SwrStoreTiles(ctx->swrContext,
1 << attachment,
post_tile_state,
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] swr: rearrange caps into limits/supported/unsupported groups

2016-11-29 Thread Rowley, Timothy O
Ouch, that must have been a pain to reorganize - thanks.  Visual inspection 
says the caps are the same before and after, and testing shows it still passing 
the same tests.

Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

I find this a lot more readable and compact - much easier to scan
through the list and see what's on and what's off.

No functional change intended.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 213 +
1 file changed, 84 insertions(+), 129 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 194b8f0..dc55d3e 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -153,54 +153,15 @@ static int
swr_get_param(struct pipe_screen *screen, enum pipe_cap param)
{
   switch (param) {
-   case PIPE_CAP_NPOT_TEXTURES:
-   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
-   case PIPE_CAP_MIXED_COLOR_DEPTH_BITS:
-  return 1;
-   case PIPE_CAP_TWO_SIDED_STENCIL:
-  return 1;
-   case PIPE_CAP_SM3:
-  return 1;
-   case PIPE_CAP_ANISOTROPIC_FILTER:
-  return 0;
-   case PIPE_CAP_POINT_SPRITE:
-  return 1;
+  /* limits */
   case PIPE_CAP_MAX_RENDER_TARGETS:
  return PIPE_MAX_COLOR_BUFS;
-   case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
-  return 1;
-   case PIPE_CAP_OCCLUSION_QUERY:
-   case PIPE_CAP_QUERY_TIME_ELAPSED:
-   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
-  return 1;
-   case PIPE_CAP_TEXTURE_MIRROR_CLAMP:
-  return 1;
-   case PIPE_CAP_TEXTURE_SHADOW_MAP:
-  return 1;
-   case PIPE_CAP_TEXTURE_SWIZZLE:
-  return 1;
-   case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
-  return 0;
   case PIPE_CAP_MAX_TEXTURE_2D_LEVELS:
  return SWR_MAX_TEXTURE_2D_LEVELS;
   case PIPE_CAP_MAX_TEXTURE_3D_LEVELS:
  return SWR_MAX_TEXTURE_3D_LEVELS;
   case PIPE_CAP_MAX_TEXTURE_CUBE_LEVELS:
  return SWR_MAX_TEXTURE_CUBE_LEVELS;
-   case PIPE_CAP_BLEND_EQUATION_SEPARATE:
-  return 1;
-   case PIPE_CAP_INDEP_BLEND_ENABLE:
-  return 1;
-   case PIPE_CAP_INDEP_BLEND_FUNC:
-  return 1;
-   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
-  return 0; // Don't support lower left frag coord.
-   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT:
-   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
-   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
-  return 1;
-   case PIPE_CAP_DEPTH_CLIP_DISABLE:
-  return 1;
   case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
  return MAX_SO_STREAMS;
   case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
@@ -213,134 +174,112 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
  return 1;
   case PIPE_CAP_MAX_VERTEX_ATTRIB_STRIDE:
  return 2048;
-   case PIPE_CAP_PRIMITIVE_RESTART:
-  return 1;
-   case PIPE_CAP_SHADER_STENCIL_EXPORT:
-  return 0;
-   case PIPE_CAP_TGSI_INSTANCEID:
-   case PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR:
-   case PIPE_CAP_START_INSTANCE:
-  return 1;
-   case PIPE_CAP_SEAMLESS_CUBE_MAP:
-   case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE:
-  return 1;
   case PIPE_CAP_MAX_TEXTURE_ARRAY_LAYERS:
  return SWR_MAX_TEXTURE_ARRAY_LAYERS;
   case PIPE_CAP_MIN_TEXEL_OFFSET:
  return -8;
   case PIPE_CAP_MAX_TEXEL_OFFSET:
  return 7;
-   case PIPE_CAP_CONDITIONAL_RENDER:
-  return 1;
-   case PIPE_CAP_TEXTURE_BARRIER:
+   case PIPE_CAP_GLSL_FEATURE_LEVEL:
+  return 330;
+   case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
+  return 16;
+   case PIPE_CAP_MIN_MAP_BUFFER_ALIGNMENT:
+  return 64;
+   case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
+  return 65536;
+   case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
  return 0;
-   case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
-   case PIPE_CAP_VERTEX_COLOR_CLAMPED:
+   case PIPE_CAP_MAX_VIEWPORTS:
+  return 1;
+   case PIPE_CAP_ENDIANNESS:
+  return PIPE_ENDIAN_NATIVE;
+   case PIPE_CAP_MIN_TEXTURE_GATHER_OFFSET:
+   case PIPE_CAP_MAX_TEXTURE_GATHER_OFFSET:
  return 0;
+
+  /* supported features */
+   case PIPE_CAP_NPOT_TEXTURES:
+   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
+   case PIPE_CAP_MIXED_COLOR_DEPTH_BITS:
+   case PIPE_CAP_TWO_SIDED_STENCIL:
+   case PIPE_CAP_SM3:
+   case PIPE_CAP_POINT_SPRITE:
+   case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
+   case PIPE_CAP_OCCLUSION_QUERY:
+   case PIPE_CAP_QUERY_TIME_ELAPSED:
+   case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
+   case PIPE_CAP_TEXTURE_MIRROR_CLAMP:
+   case PIPE_CAP_TEXTURE_SHADOW_MAP:
+   case PIPE_CAP_TEXTURE_SWIZZLE:
+   case PIPE_CAP_BLEND_EQUATION_SEPARATE:
+   case PIPE_CAP_INDEP_BLEND_ENABLE:
+   case PIPE_CAP_INDEP_BLEND_FUNC:
+   case PIPE_CAP_TGSI_FS_COORD_ORIGIN_UPPER_LEFT:
+   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER:
+   case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER:
+   case PIPE_CAP_DEPTH_CLIP_DISABLE:
+   cas

Re: [Mesa-dev] [PATCH 4/6] swr: use util_copy_framebuffer_state helper

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.cpp | 13 +
1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 4119379..8541aca 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -617,18 +617,7 @@ swr_set_framebuffer_state(struct pipe_context *pipe,
   assert(fb->height <= KNOB_GUARDBAND_HEIGHT);

   if (changed) {
-  unsigned i;
-  for (i = 0; i < fb->nr_cbufs; ++i)
- pipe_surface_reference(&ctx->framebuffer.cbufs[i], fb->cbufs[i]);
-  for (; i < ctx->framebuffer.nr_cbufs; ++i)
- pipe_surface_reference(&ctx->framebuffer.cbufs[i], NULL);
-
-  ctx->framebuffer.nr_cbufs = fb->nr_cbufs;
-
-  ctx->framebuffer.width = fb->width;
-  ctx->framebuffer.height = fb->height;
-
-  pipe_surface_reference(&ctx->framebuffer.zsbuf, fb->zsbuf);
+  util_copy_framebuffer_state(&ctx->framebuffer, fb);

  ctx->dirty |= SWR_NEW_FRAMEBUFFER;
   }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] swr: enable cubemap arrays

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Everything is in place for these.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index dc55d3e..b17faee 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -247,6 +247,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
   case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
   case PIPE_CAP_CULL_DISTANCE:
+   case PIPE_CAP_CUBE_MAP_ARRAY:
  return 1;

  /* unsupported features */
@@ -264,7 +265,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
   case PIPE_CAP_TEXTURE_MULTISAMPLE:
-   case PIPE_CAP_CUBE_MAP_ARRAY:
   case PIPE_CAP_TGSI_TEXCOORD:
   case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
   case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/6] swr: reorder renderable formats, add grouping comments

2016-11-29 Thread Rowley, Timothy O
I’ve verified the same entries are in the list before/after.

Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 152 +++--
1 file changed, 87 insertions(+), 65 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index b17faee..642f9be 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -377,89 +377,141 @@ SWR_FORMAT
mesa_to_swr_format(enum pipe_format format)
{
   static const std::map mesa2swr = {
-  {PIPE_FORMAT_B8G8R8A8_UNORM, B8G8R8A8_UNORM},
-  {PIPE_FORMAT_B8G8R8X8_UNORM, B8G8R8X8_UNORM},
-  {PIPE_FORMAT_B5G5R5A1_UNORM, B5G5R5A1_UNORM},
-  {PIPE_FORMAT_B4G4R4A4_UNORM, B4G4R4A4_UNORM},
-  {PIPE_FORMAT_B5G6R5_UNORM,   B5G6R5_UNORM},
-  {PIPE_FORMAT_R10G10B10A2_UNORM,  R10G10B10A2_UNORM},
-  {PIPE_FORMAT_A8_UNORM,   A8_UNORM},
+  /* depth / stencil */
  {PIPE_FORMAT_Z16_UNORM,  R16_UNORM}, // z
  {PIPE_FORMAT_Z32_FLOAT,  R32_FLOAT}, // z
  {PIPE_FORMAT_Z24_UNORM_S8_UINT,  R24_UNORM_X8_TYPELESS}, // z
  {PIPE_FORMAT_Z24X8_UNORM,R24_UNORM_X8_TYPELESS}, // z
+  {PIPE_FORMAT_Z32_FLOAT_S8X24_UINT,   R32_FLOAT_X8X24_TYPELESS}, // z
+
+  /* alpha */
+  {PIPE_FORMAT_A8_UNORM,   A8_UNORM},
+  {PIPE_FORMAT_A16_UNORM,  A16_UNORM},
+  {PIPE_FORMAT_A16_FLOAT,  A16_FLOAT},
+  {PIPE_FORMAT_A32_FLOAT,  A32_FLOAT},
+
+  /* odd sizes, bgr */
+  {PIPE_FORMAT_B5G6R5_UNORM,   B5G6R5_UNORM},
+  {PIPE_FORMAT_B5G6R5_SRGB,B5G6R5_UNORM_SRGB},
+  {PIPE_FORMAT_B5G5R5A1_UNORM, B5G5R5A1_UNORM},
+  {PIPE_FORMAT_B5G5R5X1_UNORM, B5G5R5X1_UNORM},
+  {PIPE_FORMAT_B4G4R4A4_UNORM, B4G4R4A4_UNORM},
+  {PIPE_FORMAT_B8G8R8A8_UNORM, B8G8R8A8_UNORM},
+  {PIPE_FORMAT_B8G8R8A8_SRGB,  B8G8R8A8_UNORM_SRGB},
+  {PIPE_FORMAT_B8G8R8X8_UNORM, B8G8R8X8_UNORM},
+  {PIPE_FORMAT_B8G8R8X8_SRGB,  B8G8R8X8_UNORM_SRGB},
+
+  /* rgb10a2 */
+  {PIPE_FORMAT_R10G10B10A2_UNORM,  R10G10B10A2_UNORM},
+  {PIPE_FORMAT_R10G10B10A2_SNORM,  R10G10B10A2_SNORM},
+  {PIPE_FORMAT_R10G10B10A2_USCALED,R10G10B10A2_USCALED},
+  {PIPE_FORMAT_R10G10B10A2_SSCALED,R10G10B10A2_SSCALED},
+  {PIPE_FORMAT_R10G10B10A2_UINT,   R10G10B10A2_UINT},
+
+  /* rgb10x2 */
+  {PIPE_FORMAT_R10G10B10X2_USCALED,R10G10B10X2_USCALED},
+
+  /* bgr10a2 */
+  {PIPE_FORMAT_B10G10R10A2_UNORM,  B10G10R10A2_UNORM},
+  {PIPE_FORMAT_B10G10R10A2_SNORM,  B10G10R10A2_SNORM},
+  {PIPE_FORMAT_B10G10R10A2_USCALED,B10G10R10A2_USCALED},
+  {PIPE_FORMAT_B10G10R10A2_SSCALED,B10G10R10A2_SSCALED},
+  {PIPE_FORMAT_B10G10R10A2_UINT,   B10G10R10A2_UINT},
+
+  /* bgr10x2 */
+  {PIPE_FORMAT_B10G10R10X2_UNORM,  B10G10R10X2_UNORM},
+
+  /* r11g11b10 */
+  {PIPE_FORMAT_R11G11B10_FLOAT,R11G11B10_FLOAT},
+
+  /* 32 bits per component */
  {PIPE_FORMAT_R32_FLOAT,  R32_FLOAT},
  {PIPE_FORMAT_R32G32_FLOAT,   R32G32_FLOAT},
  {PIPE_FORMAT_R32G32B32_FLOAT,R32G32B32_FLOAT},
  {PIPE_FORMAT_R32G32B32A32_FLOAT, R32G32B32A32_FLOAT},
+  {PIPE_FORMAT_R32G32B32X32_FLOAT, R32G32B32X32_FLOAT},
+
  {PIPE_FORMAT_R32_USCALED,R32_USCALED},
  {PIPE_FORMAT_R32G32_USCALED, R32G32_USCALED},
  {PIPE_FORMAT_R32G32B32_USCALED,  R32G32B32_USCALED},
  {PIPE_FORMAT_R32G32B32A32_USCALED,   R32G32B32A32_USCALED},
+
  {PIPE_FORMAT_R32_SSCALED,R32_SSCALED},
  {PIPE_FORMAT_R32G32_SSCALED, R32G32_SSCALED},
  {PIPE_FORMAT_R32G32B32_SSCALED,  R32G32B32_SSCALED},
  {PIPE_FORMAT_R32G32B32A32_SSCALED,   R32G32B32A32_SSCALED},
+
+  {PIPE_FORMAT_R32_UINT,   R32_UINT},
+  {PIPE_FORMAT_R32G32_UINT,R32G32_UINT},
+  {PIPE_FORMAT_R32G32B32_UINT, R32G32B32_UINT},
+  {PIPE_FORMAT_R32G32B32A32_UINT,  R32G32B32A32_UINT},
+
+  {PIPE_FORMAT_R32_SINT,   R32_SINT},
+  {PIPE_FORMAT_R32G32_SINT,R32G32_SINT},
+  {PIPE_FORMAT_R32G32B32_SINT, R32G32B32_SINT},
+  {PIPE_FORMAT_R32G32B32A32_SINT,  R32G32B32A32_SINT},
+
+  /* 16 bits per component */
  {PIPE_FORMAT_R16_UNORM,  R16_UNORM},
  {PIPE_FORMAT_R16G16_UNORM,   R16G16_UNORM},
  {PIPE_FORMAT_R16G16B16_UNORM,R16G16B16_UNORM},
  {PIPE_FORMAT_R16G16B16A16_UNORM, R16G16B16A16_UNORM},
+  {PIPE_FORMAT_R16G16B16X16_UNORM, R16G16B16X16_UNORM},
+
  {PIPE_FORMAT_R16_USCALED,   

Re: [Mesa-dev] [PATCH 6/6] swr: add missing rgbx8_srgb variant

2016-11-29 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 22, 2016, at 7:37 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 642f9be..19bb102 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -497,6 +497,7 @@ mesa_to_swr_format(enum pipe_format format)
  {PIPE_FORMAT_R8G8B8A8_UNORM, R8G8B8A8_UNORM},
  {PIPE_FORMAT_R8G8B8A8_SRGB,  R8G8B8A8_UNORM_SRGB},
  {PIPE_FORMAT_R8G8B8X8_UNORM, R8G8B8X8_UNORM},
+  {PIPE_FORMAT_R8G8B8X8_SRGB,  R8G8B8X8_UNORM_SRGB},

  {PIPE_FORMAT_R8_USCALED, R8_USCALED},
  {PIPE_FORMAT_R8G8_USCALED,   R8G8_USCALED},
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer jit] use signed integer representation for logic op

2016-11-29 Thread Rowley, Timothy O

On Nov 27, 2016, at 11:13 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

On Thu, Nov 24, 2016 at 6:11 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:
Instead of (incorrectly) biasing the snorm value to make it look like a
unorm, just use signed integer math.

This fixes arb_color_buffer_float-render GL_RGBA8_SNORM

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp | 17 -
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
index ad809c4..339ca52 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
@@ -692,9 +692,13 @@ struct BlendJit : public Builder
dst[i] = BITCAST(dst[i], mSimdInt32Ty);
break;
case SWR_TYPE_SNORM:
-src[i] = FADD(src[i], VIMMED1(0.5f));
-dst[i] = FADD(dst[i], VIMMED1(0.5f));
-/* fallthrough */
+src[i] = FP_TO_SI(
+FMUL(src[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
+dst[i] = FP_TO_SI(
+FMUL(dst[i], VIMMED1(scale[i])),
+mSimdInt32Ty);
+break;
case SWR_TYPE_UNORM:
src[i] = FP_TO_UI(
FMUL(src[i], VIMMED1(scale[i])),
@@ -728,11 +732,14 @@ struct BlendJit : public Builder
result[i] = BITCAST(result[i], mSimdFP32Ty);
break;
case SWR_TYPE_SNORM:
+result[i] = SHL(result[i], 32 - info.bpc[i]);
+result[i] = ASHR(result[i], 32 - info.bpc[i]);

These two immediate arguments should probably have a C() around them.
I've fixed that up in my tree. Hopefully these will emit as VPSLLD and
VPSRAD. Not sure how to check that.

With the version of the patch from your tree, I’m seeing this IR:

  %24 = ashr exact <8 x i32> %23, i32 24
  %25 = sitofp <8 x i32> %24 to <8 x float>
  %26 = fmul <8 x float> %25, 
  store <8 x float> %26, <8 x float>* %result, align 32

Turn into this x86 code:

  9a:   vpslld ymm1,ymm3,0x18
  9f:   vpsrad ymm1,ymm1,0x18
  a4:   vcvtdq2ps ymm1,ymm1
  a8:   vmulps ymm1,ymm1,ymm2
  ac:   vmovaps YMMWORD PTR [rax+0x20],ymm1

So llvm does what you expected.

Version of this patch from your tree Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>



+result[i] = FMUL(SI_TO_FP(result[i], mSimdFP32Ty),
+ VIMMED1(1.0f / scale[i]));
+break;
case SWR_TYPE_UNORM:
result[i] = FMUL(UI_TO_FP(result[i], mSimdFP32Ty),
 VIMMED1(1.0f / scale[i]));
-if (info.type[i] == SWR_TYPE_SNORM)
-result[i] = FADD(result[i], VIMMED1(-0.5f));
break;
}

--
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer memory] assert when trying to convert an unknown format

2016-11-30 Thread Rowley, Timothy O
Not seeing this assert fire on our tests either.

Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:04 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

I've been running this for a little while and haven't hit it. I had a theory
at one point that there was a missing format in there which turned out to be
false, but I think this is still good to have rather than silently fail.

src/gallium/drivers/swr/rasterizer/memory/Convert.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/swr/rasterizer/memory/Convert.h 
b/src/gallium/drivers/swr/rasterizer/memory/Convert.h
index c31459c..527324c 100644
--- a/src/gallium/drivers/swr/rasterizer/memory/Convert.h
+++ b/src/gallium/drivers/swr/rasterizer/memory/Convert.h
@@ -724,6 +724,7 @@ INLINE static void ConvertPixelFromFloat(
case R8G8B8_SINT: ConvertPixelFromFloat(pDst, srcPixel); break;
case RAW: ConvertPixelFromFloat(pDst, srcPixel); break;
default:
+SWR_ASSERT(0);
break;
}
}
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer core] don't attempt to load another RTAI when storing

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 16, 2016, at 9:04 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Since we don't pass a renderTargetArrayIndex in, and the current hot
tile may be for a different index, we may end up loading the RTAI=0 into
the hot tile for no reason.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

Noticed this when doing an audit of GetHotTile calls without a 
renderTargetArrayIndex being passed in. In this case, I don't think it should 
be loading at all...

Note that this has not been rigorously tested.

src/gallium/drivers/swr/rasterizer/core/backend.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/backend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
index 3375585..29d0ff5 100644
--- a/src/gallium/drivers/swr/rasterizer/core/backend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/backend.cpp
@@ -361,7 +361,7 @@ void ProcessStoreTileBE(DRAW_CONTEXT *pDC, uint32_t 
workerId, uint32_t macroTile
MacroTileMgr::getTileIndices(macroTile, x, y);

// Only need to store the hottile if it's been rendered to...
-HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTile(pContext, pDC, 
macroTile, attachment, false);
+HOTTILE *pHotTile = pContext->pHotTileMgr->GetHotTileNoLoad(pContext, pDC, 
macroTile, attachment, false);
if (pHotTile)
{
// clear if clear is pending (i.e., not rendered to), then mark as 
dirty for store.
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: remove warning about multi-layer surfaces

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:05 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

We now support clearing these, and actually rendering to multiple layers
would require GS support, which will fail in much more spectacular ways
for now. Once that is hooked up, there won't be anything else to do
here.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_context.cpp | 4 
1 file changed, 4 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index 5a1927c..b355bba 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -62,10 +62,6 @@ swr_create_surface(struct pipe_context *pipe,
 ps->u.tex.level = surf_tmpl->u.tex.level;
 ps->u.tex.first_layer = surf_tmpl->u.tex.first_layer;
 ps->u.tex.last_layer = surf_tmpl->u.tex.last_layer;
- if (ps->u.tex.first_layer != ps->u.tex.last_layer) {
-debug_printf("creating surface with multiple layers, rendering "
- "to first layer only\n");
- }
  } else {
 /* setting width as number of elements should get us correct
  * renderbuffer width */
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] swr: don't advertise stream pause/resume

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

There is no support for resuming streamout. Furthermore, this also
controls glDrawTransformFeedback functionality which requires the same
ability to query how many primitives were sent out of TF.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

I have a partially-working patch for bringing this back, but it's not 100%
quite yet - some sort of issues with concurrency I have yet to track down.

However in the current state, this is just totally not supported by the FE
(but the swr core does do this).

src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index 19bb102..e184548 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -232,7 +232,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_USER_VERTEX_BUFFERS:
   case PIPE_CAP_USER_INDEX_BUFFERS:
   case PIPE_CAP_USER_CONSTANT_BUFFERS:
-   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
   case PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS:
   case PIPE_CAP_QUERY_TIMESTAMP:
   case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
@@ -311,6 +310,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
   case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
   case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
+   case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME:
  return 0;

   case PIPE_CAP_VENDOR_ID:
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] swr: properly report max number of SO components

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The components count the number of individual values, not the number of
slots.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_screen.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index e184548..2388922 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -166,7 +166,7 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
  return MAX_SO_STREAMS;
   case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS:
   case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS:
-  return MAX_ATTRIBUTES;
+  return MAX_ATTRIBUTES * 4;
   case PIPE_CAP_MAX_GEOMETRY_OUTPUT_VERTICES:
   case PIPE_CAP_MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS:
  return 1024;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] swr: fix assertion for max number of so targets

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The number has to be less than or equal to the max, not just less than.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_state.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index 9f6b5b0..fc835dc 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1570,7 +1570,7 @@ swr_set_so_targets(struct pipe_context *pipe,
   struct swr_context *swr = swr_context(pipe);
   uint32_t i;

-   assert(num_targets < MAX_SO_STREAMS);
+   assert(num_targets <= MAX_SO_STREAMS);

   for (i = 0; i < num_targets; i++) {
  pipe_so_target_reference(
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] swr: turn off queries around blits

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/swr_context.cpp | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_context.cpp 
b/src/gallium/drivers/swr/swr_context.cpp
index b355bba..b8c87fa 100644
--- a/src/gallium/drivers/swr/swr_context.cpp
+++ b/src/gallium/drivers/swr/swr_context.cpp
@@ -301,7 +301,10 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
  return;
   }

-   /* XXX turn off occlusion and streamout queries */
+   if (ctx->active_queries) {
+  SwrEnableStatsFE(ctx->swrContext, FALSE);
+  SwrEnableStatsBE(ctx->swrContext, FALSE);
+   }

   util_blitter_save_vertex_buffer_slot(ctx->blitter, ctx->vertex_buffer);
   util_blitter_save_vertex_elements(ctx->blitter, (void *)ctx->velems);
@@ -335,6 +338,11 @@ swr_blit(struct pipe_context *pipe, const struct 
pipe_blit_info *blit_info)
  ctx->render_cond_mode);

   util_blitter_blit(ctx->blitter, &info);
+
+   if (ctx->active_queries) {
+  SwrEnableStatsFE(ctx->swrContext, TRUE);
+  SwrEnableStatsBE(ctx->swrContext, TRUE);
+   }
}


--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] swr: add streamout buffer offset into pBuffer pointer

2016-11-30 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Nov 29, 2016, at 8:23 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

The buffer_size does not take the offset into account. Just add the
offset into the pointer which lines up the structures much better.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---

This doesn't really fix anything right now, but logically the streamOffset
is incremented on each draw, and is optionally written back out as a watermark
indicator (for pausing/resuming streams). So it should be relative to the
logical start of the buffer.

src/gallium/drivers/swr/swr_state.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index fc835dc..4475252 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1488,10 +1488,11 @@ swr_update_derived(struct pipe_context *pipe,
continue;
 buffer.enable = true;
 buffer.pBuffer =
-(uint32_t *)swr_resource_data(ctx->so_targets[i]->buffer);
+(uint32_t *)(swr_resource_data(ctx->so_targets[i]->buffer) +
+ ctx->so_targets[i]->buffer_offset);
 buffer.bufferSize = ctx->so_targets[i]->buffer_size >> 2;
 buffer.pitch = stream_output->stride[i];
- buffer.streamOffset = ctx->so_targets[i]->buffer_offset >> 2;
+ buffer.streamOffset = 0;

 SwrSetSoBuffers(ctx->swrContext, &buffer, i);
  }
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] swr: Fix type to match parameters of std::max()

2016-12-02 Thread Rowley, Timothy O
Should have parens on the zsbuf test line to match your corresponding change 
for cbuf attachments.

With that change, Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Dec 2, 2016, at 1:18 PM, George Kyriazis 
mailto:george.kyria...@intel.com>> wrote:

Include propagation of comparisons further down.
---
src/gallium/drivers/swr/swr_clear.cpp | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_clear.cpp 
b/src/gallium/drivers/swr/swr_clear.cpp
index f59179f..08eead8 100644
--- a/src/gallium/drivers/swr/swr_clear.cpp
+++ b/src/gallium/drivers/swr/swr_clear.cpp
@@ -35,7 +35,7 @@ swr_clear(struct pipe_context *pipe,
   struct pipe_framebuffer_state *fb = &ctx->framebuffer;

   UINT clearMask = 0;
-   int layers = 0;
+   unsigned layers = 0;

   if (!swr_check_render_cond(pipe))
  return;
@@ -47,20 +47,20 @@ swr_clear(struct pipe_context *pipe,
 if (fb->cbufs[i] && (buffers & (PIPE_CLEAR_COLOR0 << i))) {
clearMask |= (SWR_ATTACHMENT_COLOR0_BIT << i);
layers = std::max(layers, fb->cbufs[i]->u.tex.last_layer -
-  fb->cbufs[i]->u.tex.first_layer + 1);
+  fb->cbufs[i]->u.tex.first_layer + 1u);
 }
   }

   if (buffers & PIPE_CLEAR_DEPTH && fb->zsbuf) {
  clearMask |= SWR_ATTACHMENT_DEPTH_BIT;
  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
-fb->zsbuf->u.tex.first_layer + 1);
+fb->zsbuf->u.tex.first_layer + 1u);
   }

   if (buffers & PIPE_CLEAR_STENCIL && fb->zsbuf) {
  clearMask |= SWR_ATTACHMENT_STENCIL_BIT;
  layers = std::max(layers, fb->zsbuf->u.tex.last_layer -
-fb->zsbuf->u.tex.first_layer + 1);
+fb->zsbuf->u.tex.first_layer + 1u);
   }

#if 0 // XXX HACK, override clear color alpha. On ubuntu, clears are
@@ -68,7 +68,7 @@ swr_clear(struct pipe_context *pipe,
   ((union pipe_color_union *)color)->f[3] = 1.0; /* cast off your const'd-ness 
*/
#endif

-   for (int i = 0; i < layers; ++i) {
+   for (unsigned i = 0; i < layers; ++i) {
  swr_update_draw_context(ctx);
  SwrClearRenderTarget(ctx->swrContext, clearMask, i,
   color->f, depth, stencil,
@@ -76,11 +76,11 @@ swr_clear(struct pipe_context *pipe,

  // Mask out the attachments that are out of layers.
  if (fb->zsbuf &&
-  fb->zsbuf->u.tex.last_layer - fb->zsbuf->u.tex.first_layer <= i)
+  fb->zsbuf->u.tex.last_layer <= fb->zsbuf->u.tex.first_layer + i)
 clearMask &= ~(SWR_ATTACHMENT_DEPTH_BIT | SWR_ATTACHMENT_STENCIL_BIT);
  for (unsigned c = 0; c < fb->nr_cbufs; ++c) {
 const struct pipe_surface *sf = fb->cbufs[c];
- if (sf && sf->u.tex.last_layer - sf->u.tex.first_layer <= i)
+ if (sf && (sf->u.tex.last_layer <= sf->u.tex.first_layer + i))
clearMask &= ~(SWR_ATTACHMENT_COLOR0_BIT << c);
  }
   }
--
2.10.0.windows.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Fix active_queries count

2016-12-02 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Dec 1, 2016, at 7:08 PM, Bruce Cherniak 
mailto:bruce.chern...@intel.com>> wrote:

The active_query count was incorrect for query types that don't require
a begin_query.  Removed the unnecessary assert.
---
src/gallium/drivers/swr/swr_query.cpp | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_query.cpp 
b/src/gallium/drivers/swr/swr_query.cpp
index a95e0d8..6eb0781 100644
--- a/src/gallium/drivers/swr/swr_query.cpp
+++ b/src/gallium/drivers/swr/swr_query.cpp
@@ -165,8 +165,9 @@ swr_begin_query(struct pipe_context *pipe, struct 
pipe_query *q)
   /* Initialize Results */
   memset(&pq->result, 0, sizeof(pq->result));
   switch (pq->type) {
+   case PIPE_QUERY_GPU_FINISHED:
   case PIPE_QUERY_TIMESTAMP:
-  /* nothing to do */
+  /* nothing to do, but don't want the default */
  break;
   case PIPE_QUERY_TIME_ELAPSED:
  pq->result.timestamp_start = swr_get_timestamp(pipe->screen);
@@ -181,10 +182,10 @@ swr_begin_query(struct pipe_context *pipe, struct 
pipe_query *q)
 SwrEnableStatsFE(ctx->swrContext, TRUE);
 SwrEnableStatsBE(ctx->swrContext, TRUE);
  }
+  ctx->active_queries++;
  break;
   }

-   ctx->active_queries++;

   return true;
}
@@ -195,11 +196,10 @@ swr_end_query(struct pipe_context *pipe, struct 
pipe_query *q)
   struct swr_context *ctx = swr_context(pipe);
   struct swr_query *pq = swr_query(q);

-   assert(ctx->active_queries
-  && "swr_end_query, there are no active queries!");
-   ctx->active_queries--;
-
   switch (pq->type) {
+   case PIPE_QUERY_GPU_FINISHED:
+  /* nothing to do, but don't want the default */
+  break;
   case PIPE_QUERY_TIMESTAMP:
   case PIPE_QUERY_TIME_ELAPSED:
  pq->result.timestamp_end = swr_get_timestamp(pipe->screen);
@@ -214,6 +214,7 @@ swr_end_query(struct pipe_context *pipe, struct pipe_query 
*q)
  swr_fence_submit(ctx, pq->fence);

  /* Only change stat collection if there are no active queries */
+  ctx->active_queries--;
  if (ctx->active_queries == 0) {
 SwrEnableStatsFE(ctx->swrContext, FALSE);
 SwrEnableStatsBE(ctx->swrContext, FALSE);
--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallivm: use getHostCPUFeatures on x86/llvm-4.0+.

2016-12-06 Thread Rowley, Timothy O
Interesting.  My testing was done using piglit on an avx512 capable processor, 
where I didn’t see any regressions.

llvmpipe’s “make check” also passes for me with this change on avx2 and avx512 
machines.

Was this the only regression you saw?

-Tim

> On Dec 6, 2016, at 12:27 AM, Michel Dänzer  wrote:
> 
> On 06/12/16 02:39 AM, Tim Rowley wrote:
>> Use llvm provided API based on cpuid rather than our own
>> manually mantained list of mattr enabling/disabling.
> 
> This change broke the llvmpipe unit test lp_test_format for me:
> 
> Testing PIPE_FORMAT_R32_FLOAT (float) ...
> FAILED
>  Packed: 00 00 00 00
>  Unpacked (0,0): 1 0 0 1 obtained
>  0 0 0 1 expected
> FAILED
>  Packed: 00 00 80 bf
>  Unpacked (0,0): 1 0 0 1 obtained
>  -1 0 0 1 expected
> 
> 
> This is on:
> 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 21
> model : 48
> model name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
> stepping  : 1
> microcode : 0x6003106
> cpu MHz   : 4100.000
> cache size: 2048 KB
> physical id   : 0
> siblings  : 4
> core id   : 0
> cpu cores : 2
> apicid: 16
> initial apicid: 0
> fpu   : yes
> fpu_exception : yes
> cpuid level   : 13
> wp: yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
> rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf 
> eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave 
> avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
> 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext 
> perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 
> xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
> decodeassists pausefilter pfthreshold overflow_recov
> bugs  : fxsave_leak sysret_ss_attrs null_seg
> bogomips  : 8200.42
> TLB size  : 1536 4K pages
> clflush size  : 64
> cache_alignment   : 64
> address sizes : 48 bits physical, 48 bits virtual
> power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [13]
> 
> 
> 
> -- 
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: perform perspective division on clip distances

2016-12-08 Thread Rowley, Timothy O

> On Nov 24, 2016, at 2:29 PM, Ilia Mirkin  wrote:
> 
> Clip distances need to be perspective-divided. This fixes all the
> interpolation-*-{distance,vertex} piglits.
> 
> Also take this opportunity to fix clip distances for points rasterized
> as triangles - the clip distance is not subject to sprite coord
> replacement, so there's no interpolation of it. We just take its value
> and put it in the "z" component of the barycentric-ready plane equation.
> (We could also just cull it at an earlier point in time, but that would
> require larger changes.)
> 

Would prefer this second change moved to a separate commit.  I’ve spent the 
most time looking at that, and still not convinced it’s correct.

> Signed-off-by: Ilia Mirkin 
> ---
> src/gallium/drivers/swr/rasterizer/core/binner.cpp | 22 +++---
> 1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> index 6f9259f..d5f2e97 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
> @@ -383,7 +383,7 @@ PFN_PROCESS_ATTRIBUTES GetProcessAttributesFunc(uint32_t 
> NumVerts, bool IsSwizzl
> /// @param clipDistMask - mask of enabled clip distances
> /// @param pUserClipBuffer - buffer to store results
> template
> -void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
> clipDistMask, float* pUserClipBuffer)
> +void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
> clipDistMask, float *pRecipW, float* pUserClipBuffer)
> {
> DWORD clipDist;
> while (_BitScanForward(&clipDist, clipDistMask))
> @@ -407,11 +407,12 @@ void ProcessUserClipDist(PA_STATE& pa, uint32_t 
> primIndex, uint8_t clipDistMask,
> 
> // setup plane equations for barycentric interpolation in the backend
> float baryCoeff[NumVerts];
> +float last = vertClipDist[NumVerts - 1] * pRecipW[NumVerts - 1];
> for (uint32_t e = 0; e < NumVerts - 1; ++e)
> {
> -baryCoeff[e] = vertClipDist[e] - vertClipDist[NumVerts - 1];
> +baryCoeff[e] = vertClipDist[e] * pRecipW[e] - last;
> }
> -baryCoeff[NumVerts - 1] = vertClipDist[NumVerts - 1];
> +baryCoeff[NumVerts - 1] = last;
> 
> for (uint32_t e = 0; e < NumVerts; ++e)
> {
> @@ -834,7 +835,7 @@ endBinTriangles:
> {
> uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
> desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
> sizeof(float));
> -ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
> desc.pUserClipBuffer);
> +ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
> &desc.pTriBuffer[12], desc.pUserClipBuffer);
> }
> 
> for (uint32_t y = aMTTop[triIndex]; y <= aMTBottom[triIndex]; ++y)
> @@ -1184,8 +1185,15 @@ void BinPoints(
> if (rastState.clipDistanceMask)
> {
> uint32_t numClipDist = 
> _mm_popcnt_u32(rastState.clipDistanceMask);
> -desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 
> * sizeof(float));
> -ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, desc.pUserClipBuffer);
> +desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 
> * sizeof(float));
> +float dists[8];
> +float one = 1.0f;
> +ProcessUserClipDist<1>(pa, primIndex, 
> rastState.clipDistanceMask, &one, dists);
> +for (uint32_t i = 0; i < numClipDist; i++) {
> +desc.pUserClipBuffer[3*i + 0] = 0.0f;
> +desc.pUserClipBuffer[3*i + 1] = 0.0f;
> +desc.pUserClipBuffer[3*i + 2] = dists[i];
> +}
> }
> 
> MacroTileMgr *pTileMgr = pDC->pTileMgr;
> @@ -1396,7 +1404,7 @@ void BinPostSetupLines(
> {
> uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
> desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
> sizeof(float));
> -ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, desc.pUserClipBuffer);
> +ProcessUserClipDist<2>(pa, primIndex, 
> rastState.clipDistanceMask, &desc.pTriBuffer[12], desc.pUserClipBuffer);
> }
> 
> MacroTileMgr *pTileMgr = pDC->pTileMgr;
> -- 
> 2.7.3
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/2] swr: supply proper clip distances to point sprites

2016-12-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Dec 8, 2016, at 8:21 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Large points become pairs of triangles when rasterized, so we must feed
it three clip distances, one for each vertex.

The clip distance is not subject to sprite coord replacement, so there's
no interpolation of it. We just take its value and put it in the "z"
component of the barycentric-ready plane equation.

(We could also just cull it at an earlier point in time, but that would
require larger changes.)

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/binner.cpp | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 1538020..d5f2e97 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -1185,9 +1185,15 @@ void BinPoints(
if (rastState.clipDistanceMask)
{
uint32_t numClipDist = 
_mm_popcnt_u32(rastState.clipDistanceMask);
-float one[2] = {1.0f, 1.0f};
-desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, one, desc.pUserClipBuffer);
+desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
sizeof(float));
+float dists[8];
+float one = 1.0f;
+ProcessUserClipDist<1>(pa, primIndex, 
rastState.clipDistanceMask, &one, dists);
+for (uint32_t i = 0; i < numClipDist; i++) {
+desc.pUserClipBuffer[3*i + 0] = 0.0f;
+desc.pUserClipBuffer[3*i + 1] = 0.0f;
+desc.pUserClipBuffer[3*i + 2] = dists[i];
+}
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/2] swr: perform perspective division on clip distances

2016-12-08 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Dec 8, 2016, at 8:21 PM, Ilia Mirkin 
mailto:imir...@alum.mit.edu>> wrote:

Clip distances need to be perspective-divided. This fixes all the
interpolation-*-{distance,vertex} piglits.

Signed-off-by: Ilia Mirkin mailto:imir...@alum.mit.edu>>
---
src/gallium/drivers/swr/rasterizer/core/binner.cpp | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 6f9259f..1538020 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -383,7 +383,7 @@ PFN_PROCESS_ATTRIBUTES GetProcessAttributesFunc(uint32_t 
NumVerts, bool IsSwizzl
/// @param clipDistMask - mask of enabled clip distances
/// @param pUserClipBuffer - buffer to store results
template
-void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
clipDistMask, float* pUserClipBuffer)
+void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, uint8_t 
clipDistMask, float *pRecipW, float* pUserClipBuffer)
{
DWORD clipDist;
while (_BitScanForward(&clipDist, clipDistMask))
@@ -407,11 +407,12 @@ void ProcessUserClipDist(PA_STATE& pa, uint32_t 
primIndex, uint8_t clipDistMask,

// setup plane equations for barycentric interpolation in the backend
float baryCoeff[NumVerts];
+float last = vertClipDist[NumVerts - 1] * pRecipW[NumVerts - 1];
for (uint32_t e = 0; e < NumVerts - 1; ++e)
{
-baryCoeff[e] = vertClipDist[e] - vertClipDist[NumVerts - 1];
+baryCoeff[e] = vertClipDist[e] * pRecipW[e] - last;
}
-baryCoeff[NumVerts - 1] = vertClipDist[NumVerts - 1];
+baryCoeff[NumVerts - 1] = last;

for (uint32_t e = 0; e < NumVerts; ++e)
{
@@ -834,7 +835,7 @@ endBinTriangles:
{
uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 3 * 
sizeof(float));
-ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
desc.pUserClipBuffer);
+ProcessUserClipDist<3>(pa, triIndex, rastState.clipDistanceMask, 
&desc.pTriBuffer[12], desc.pUserClipBuffer);
}

for (uint32_t y = aMTTop[triIndex]; y <= aMTBottom[triIndex]; ++y)
@@ -1184,8 +1185,9 @@ void BinPoints(
if (rastState.clipDistanceMask)
{
uint32_t numClipDist = 
_mm_popcnt_u32(rastState.clipDistanceMask);
+float one[2] = {1.0f, 1.0f};
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, desc.pUserClipBuffer);
+ProcessUserClipDist<2>(pa, primIndex, 
rastState.clipDistanceMask, one, desc.pUserClipBuffer);
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
@@ -1396,7 +1398,7 @@ void BinPostSetupLines(
{
uint32_t numClipDist = _mm_popcnt_u32(rastState.clipDistanceMask);
desc.pUserClipBuffer = (float*)pArena->Alloc(numClipDist * 2 * 
sizeof(float));
-ProcessUserClipDist<2>(pa, primIndex, rastState.clipDistanceMask, 
desc.pUserClipBuffer);
+ProcessUserClipDist<2>(pa, primIndex, rastState.clipDistanceMask, 
&desc.pTriBuffer[12], desc.pUserClipBuffer);
}

MacroTileMgr *pTileMgr = pDC->pTileMgr;
--
2.7.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: Fix BugID 9919 compile error (icc-only).

2016-12-22 Thread Rowley, Timothy O
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99119
Reviewed-by: Tim Rowley 
mailto:timothy.o.row...@intel.com>>

On Dec 22, 2016, at 6:06 PM, Bruce Cherniak 
mailto:bruce.chern...@intel.com>> wrote:

ICC doesn't like the use of nullptr (std::nullptr_t) argument in
p_atomic_set.  GCC and clang don't complain.
---
src/gallium/drivers/swr/swr_fence_work.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/swr_fence_work.cpp 
b/src/gallium/drivers/swr/swr_fence_work.cpp
index 3f83e61..1fd2a83 100644
--- a/src/gallium/drivers/swr/swr_fence_work.cpp
+++ b/src/gallium/drivers/swr/swr_fence_work.cpp
@@ -39,7 +39,7 @@ swr_fence_do_work(struct swr_fence *fence)
  work = fence->work.head.next;
  /* Immediately clear the head so any new work gets added to a new work
   * queue */
-  p_atomic_set(&fence->work.head.next, nullptr);
+  p_atomic_set(&fence->work.head.next, 0);
  p_atomic_set(&fence->work.tail, &fence->work.head);
  p_atomic_set(&fence->work.count, 0);

--
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] swr: Refactor checks for compiler feature flags

2016-06-29 Thread Rowley, Timothy O
Tested on gcc-5.3.1, clang-3.8, icc-16.0.3

Reviewed-by: Tim Rowley 
Tested-by: Tim Rowley 

> On Jun 28, 2016, at 2:50 PM, Chuck Atkins  wrote:
> 
> Encapsulate the test for which flags are needed to get a compiler to
> support certain features.  Along with this, give various options to try
> for AVX and AVX2 support.  Ideally we want to use specific instruction
> set feature flags, like -mavx2 for instance instead of -march=haswell,
> but the flags required for certain compilers are different.  This
> allows, for AVX2 for instance, GCC to use -mavx2 -mfma -mbmi2 -mf16c
> while the Intel compiler which doesn't support those flags can fall
> back to using -march=core-avx2.
> 
> This addresses a bug where the Intel compiler will silently ignore the
> AVX2 instruction feature flags and then potentially fail to build.
> 
> v2: Pass preprocessor-check argument as true-state instead of
>false-state for clarity.
> v3: Reduce AVX2 define test to just __AVX2__.  Additional defines suchas
>__FMA__, __BMI2__, and __F16C__ appear to be inconsistently defined
>w.r.t thier availability.
> v4: Fix C++11 flags being added globally and add more logic to
>swr_require_cxx_feature_flags
> 
> Cc: Tim Rowley 
> Signed-off-by: Chuck Atkins 
> ---
> configure.ac| 73 +
> src/gallium/drivers/swr/Makefile.am |  4 +-
> 2 files changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/configure.ac b/configure.ac
> index cc9bc47..8321e8e 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2330,6 +2330,45 @@ swr_llvm_check() {
> fi
> }
> 
> +swr_require_cxx_feature_flags() {
> +feature_name="$1"
> +preprocessor_test="$2"
> +option_list="$3"
> +output_var="$4"
> +
> +AC_MSG_CHECKING([whether $CXX supports $feature_name])
> +AC_LANG_PUSH([C++])
> +save_CXXFLAGS="$CXXFLAGS"
> +save_IFS="$IFS"
> +IFS=","
> +found=0
> +for opts in $option_list
> +do
> +unset IFS
> +CXXFLAGS="$opts $save_CXXFLAGS"
> +AC_COMPILE_IFELSE(
> +[AC_LANG_PROGRAM(
> +[   #if !($preprocessor_test)
> +#error
> +#endif
> +])],
> +[found=1; break],
> +[])
> +IFS=","
> +done
> +IFS="$save_IFS"
> +CXXFLAGS="$save_CXXFLAGS"
> +AC_LANG_POP([C++])
> +if test $found -eq 1; then
> +AC_MSG_RESULT([$opts])
> +eval "$output_var=\$opts"
> +return 0
> +fi
> +AC_MSG_RESULT([no])
> +AC_MSG_ERROR([swr requires $feature_name support])
> +return 1
> +}
> +
> dnl Duplicates in GALLIUM_DRIVERS_DIRS are removed by sorting it after this 
> block
> if test -n "$with_gallium_drivers"; then
> gallium_drivers=`IFS=', '; echo $with_gallium_drivers`
> @@ -2399,31 +2438,19 @@ if test -n "$with_gallium_drivers"; then
> xswr)
> swr_llvm_check "swr"
> 
> -AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
> -SWR_AVX_CXXFLAGS="-mavx"
> -SWR_AVX2_CXXFLAGS="-mavx2 -mfma -mbmi2 -mf16c"
> -
> -AC_LANG_PUSH([C++])
> -save_CXXFLAGS="$CXXFLAGS"
> -CXXFLAGS="-std=c++11 $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([c++11 compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -
> -save_CXXFLAGS="$CXXFLAGS"
> -CXXFLAGS="$SWR_AVX_CXXFLAGS $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([AVX compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -
> -save_CFLAGS="$CXXFLAGS"
> -CXXFLAGS="$SWR_AVX2_CXXFLAGS $CXXFLAGS"
> -AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
> -  [AC_MSG_ERROR([AVX2 compiler support not 
> detected])])
> -CXXFLAGS="$save_CXXFLAGS"
> -AC_LANG_POP([C++])
> +swr_require_cxx_feature_flags "C++11" "__cplusplus >= 201103L" \
> +",-std=c++11" \
> +SWR_CXX11_CXXFLAGS
> +AC_SUBST([SWR_CXX11_CXXFLAGS])
> 
> +swr_require_cxx_feature_flags "AVX" "defined(__AVX__)" \
> +",-mavx,-march=core-avx" \
> +SWR_AVX_CXXFLAGS
> AC_SUBST([SWR_AVX_CXXFLAGS])
> +
> +swr_require_cxx_feature_flags "AVX2" "defined(__AVX2__)" \
> +",-mavx2 -mfma -mbmi2 -mf16c,-march=core-avx2" \
> +SWR_AVX2_CXXFLAGS
> AC_SUBST([SWR_AVX2_CXXFLAGS])
> 
> HAVE_GALLIUM_SWR=yes
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index d896154..210b203 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -22,7 +22,7 @@
> include Makefile.sources
> include $(top_sr

Re: [Mesa-dev] [PATCH 1/5] swr: [rasterizer] add support for llvm-3.9

2016-07-07 Thread Rowley, Timothy O

> On Jul 6, 2016, at 7:32 PM, Roland Scheidegger  wrote:
> 
> Am 06.07.2016 um 23:51 schrieb Tim Rowley:
>> ---
>> .../drivers/swr/rasterizer/jitter/builder_misc.cpp | 38 
>> --
>> .../jitter/scripts/gen_llvm_ir_macros.py   |  5 ---
>> 2 files changed, 28 insertions(+), 15 deletions(-)
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp 
>> b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> index 671178f..b23a10d 100644
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/builder_misc.cpp
>> @@ -700,20 +700,22 @@ Value *Builder::PSHUFB(Value* a, Value* b)
>> /// lower 8 values are used.
>> Value *Builder::PMOVSXBD(Value* a)
>> {
>> -Value* res;
>> +// llvm-3.9 removed the pmovsxbd intrinsic
>> +#if HAVE_LLVM < 0x309
>> // use avx2 byte sign extend instruction if available
>> if(JM()->mArch.AVX2())
>> {
>> -res = VPMOVSXBD(a);
>> +Function *pmovsxbd = 
>> Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmovsxbd);
>> +return CALL(pmovsxbd, std::initializer_list{a});
>> }
>> else
>> +#endif
>> {
>> // VPMOVSXBD output type
>> Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
>> // Extract 8 values from 128bit lane and sign extend
>> -res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> +return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> }
>> -return res;
>> }
>> 
>> //
>> @@ -722,20 +724,22 @@ Value *Builder::PMOVSXBD(Value* a)
>> /// @param a - 128bit SIMD lane(8x16bit) of 16bit integer values.
>> Value *Builder::PMOVSXWD(Value* a)
>> {
>> -Value* res;
>> +// llvm-3.9 removed the pmovsxwd intrinsic
>> +#if HAVE_LLVM < 0x309
>> // use avx2 word sign extend if available
>> if(JM()->mArch.AVX2())
>> {
>> -res = VPMOVSXWD(a);
>> +Function *pmovsxwd = 
>> Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmovsxwd);
>> +return CALL(pmovsxwd, std::initializer_list{a});
>> }
>> else
>> +#endif
>> {
>> // VPMOVSXWD output type
>> Type* v8x32Ty = VectorType::get(mInt32Ty, 8);
>> // Extract 8 values from 128bit lane and sign extend
>> -res = S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> +return S_EXT(VSHUFFLE(a, a, C({0, 1, 2, 3, 4, 5, 6, 7})), 
>> v8x32Ty);
>> }
>> -return res;
>> }
>> 
>> //
>> @@ -875,9 +879,15 @@ Value *Builder::CVTPS2PH(Value* a, Value* rounding)
>> 
>> Value *Builder::PMAXSD(Value* a, Value* b)
>> {
>> +// llvm-3.9 removed the pmax intrinsics
>> +#if HAVE_LLVM >= 0x309
>> +Value* cmp = ICMP_UGT(a, b);
>> +return SELECT(VMASK(cmp), a, b);
>> +#else
>> if (JM()->mArch.AVX2())
>> {
>> -return VPMAXSD(a, b);
>> +Function* pmaxsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmaxs_d);
>> +return CALL(pmaxsd, {a, b});
>> }
>> else
>> {
>> @@ -900,13 +910,20 @@ Value *Builder::PMAXSD(Value* a, Value* b)
>> 
>> return result;
>> }
>> +#endif
>> }
>> 
>> Value *Builder::PMINSD(Value* a, Value* b)
>> {
>> +// llvm-3.9 removed the pmin intrinsics
>> +#if HAVE_LLVM >= 0x309
>> +Value* cmp = ICMP_ULT(a, b);
>> +return SELECT(VMASK(cmp), a, b);
>> +#else
> Yep, had to deal with that in gallivm as well...
> That said, these were signed min/max here. I think you wanted to use
> ICMP_SLT/ICMP_SGT…

llvm developers do seem intent on pruning the list of x86 intrinsics.  Thanks 
for spotting the mistake - updated patch coming.

-Tim

> Roland
> 
> 
> 
> 
>> if (JM()->mArch.AVX2())
>> {
>> -return VPMINSD(a, b);
>> +Function* pminsd = Intrinsic::getDeclaration(JM()->mpCurrentModule, 
>> Intrinsic::x86_avx2_pmins_d);
>> +return CALL(pminsd, {a, b});
>> }
>> else
>> {
>> @@ -929,6 +946,7 @@ Value *Builder::PMINSD(Value* a, Value* b)
>> 
>> return result;
>> }
>> +#endif
>> }
>> 
>> void Builder::Gather4(const SWR_FORMAT format, Value* pSrcBase, Value* 
>> byteOffsets, 
>> diff --git 
>> a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py 
>> b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> index 4963c5e..234889b 100644
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/scripts/gen_llvm_ir_macros.py
>> @@ -91,8 +91,6 @@ intrinsics = [
>> ["VRCPPS", "x86_avx_rcp_ps_256", ["a"]],
>> ["VMINPS", "x86_avx_min_ps_256", ["a", "b"]],
>> ["VMAXPS", "x86_avx_max_ps_256", ["a", "b"]],
>> -["VPMINSD", "x86_avx2_pmins_d", ["

Re: [Mesa-dev] [PATCH] configure.ac/swr: build swr with -fno-strict-aliasing

2016-08-02 Thread Rowley, Timothy O

> On Aug 2, 2016, at 1:00 PM, Matt Turner  wrote:
> 
> On Tue, Aug 2, 2016 at 10:53 AM, Tim Rowley  
> wrote:
>> swr rasterizer contains numerous data transfers between vectors
>> and ordinary C types.  Fixing for strict aliasing will take time.
> 
> Oh, sorry! I forgot about swr.
> 
>> ---
>> configure.ac| 7 +++
>> src/gallium/drivers/swr/Makefile.am | 1 +
>> 2 files changed, 8 insertions(+)
>> 
>> diff --git a/configure.ac b/configure.ac
>> index aea5890..fb4a12a 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -312,6 +312,8 @@ if test "x$GCC" = xyes; then
>> MSVC2013_COMPAT_CFLAGS="-Werror=pointer-arith"
>> MSVC2013_COMPAT_CXXFLAGS="-Werror=pointer-arith"
>> 
>> +NO_STRICT_ALIASING_CFLAGS="-fno-strict-aliasing"
>> +
>> # Enable -Werror=vla if compiler supports it
>> save_CFLAGS="$CFLAGS"
>> AC_MSG_CHECKING([whether $CC supports -Werror=vla])
>> @@ -341,11 +343,16 @@ if test "x$GXX" = xyes; then
>> 
>> # We don't want floating-point math functions to set errno or trap
>> CXXFLAGS="$CXXFLAGS -fno-math-errno -fno-trapping-math"
>> +
>> +NO_STRICT_ALIASING_CXXFLAGS="-fno-strict-aliasing"
>> fi
>> 
>> AC_SUBST([MSVC2013_COMPAT_CFLAGS])
>> AC_SUBST([MSVC2013_COMPAT_CXXFLAGS])
>> 
>> +AC_SUBST([NO_STRICT_ALIASING_CFLAGS])
>> +AC_SUBST([NO_STRICT_ALIASING_CXXFLAGS])
>> +
>> dnl even if the compiler appears to support it, using visibility attributes 
>> isn't
>> dnl going to do anything useful currently on cygwin apart from emit lots of 
>> warnings
>> case "$host_os" in
>> diff --git a/src/gallium/drivers/swr/Makefile.am 
>> b/src/gallium/drivers/swr/Makefile.am
>> index 3459af3..fa02349 100644
>> --- a/src/gallium/drivers/swr/Makefile.am
>> +++ b/src/gallium/drivers/swr/Makefile.am
>> @@ -29,6 +29,7 @@ noinst_LTLIBRARIES = libmesaswr.la
>> libmesaswr_la_SOURCES = $(LOADER_SOURCES)
>> 
>> COMMON_CXXFLAGS = \
>> +   $(NO_STRICT_ALIASING_CXXFLAGS) \
> 
> I think you should just put -fno-strict-aliasing here instead. I don't
> think the variable and substitutions in configure.ac add anything.

My thinking behind using substitution was to handle non-gcc compilers, though 
both icc and clang seem to pass configure.ac’s gcc check and silently accept 
-fno-strict-aliasing.  I’ll send another patch with just the swr makefile 
change.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH RFC] gallium/swr: fold the almost identical Makefiles

2016-04-11 Thread Rowley, Timothy O
Thanks for looking at this.  I didn’t realize that automake was smart enough to 
generate separate targets for the overlapping object files; had figured that 
would be a collision.

I notice you removed the install-gallium-links portion of the makefile, which 
you had pointed out before as being a bit of a hack.  This makes testing in the 
builddir a bit more complicated (need to point LD_LIBRARY_PATH to libGL and and 
the swr libs).  I could use the result of a “make install” instead, except for 
the bug where both the swrast and gallium libGL libraries get installed:

https://bugs.freedesktop.org/show_bug.cgi?id=94086

Reviewed-by: Tim Rowley 

-Tim

> On Apr 11, 2016, at 1:21 PM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> Rather than having two almost identical Makefiles, with various VPATH
> hacks just fold them, using COMMON_* variables and actually getting
> things buildable/shipable.
> 
> Cc: Tim Rowley 
> Signed-off-by: Emil Velikov 
> ---
> 
> Tim, can you double check/continue the work started. It seems to build 
> fine here, although I'm likely missing something.
> 
> Without this (or similar fix) one cannot get a distribution tarball let 
> alone run `make distcheck'. Note: currently one of the llvm tests fail 
> if you try the latter.
> 
> Thanks
> Emil
> 
> 
> configure.ac |  2 -
> src/gallium/Makefile.am  |  2 -
> src/gallium/drivers/swr/Makefile.am  | 94 +-
> src/gallium/drivers/swr/Makefile.sources | 91 +
> src/gallium/drivers/swr/avx/Makefile.am  | 99 
> src/gallium/drivers/swr/avx2/Makefile.am | 99 
> 6 files changed, 184 insertions(+), 203 deletions(-)
> delete mode 100644 src/gallium/drivers/swr/avx/Makefile.am
> delete mode 100644 src/gallium/drivers/swr/avx2/Makefile.am
> 
> diff --git a/configure.ac b/configure.ac
> index c426c72..8c82c43 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2487,8 +2487,6 @@ AC_CONFIG_FILES([Makefile
>   src/gallium/drivers/softpipe/Makefile
>   src/gallium/drivers/svga/Makefile
>   src/gallium/drivers/swr/Makefile
> - src/gallium/drivers/swr/avx/Makefile
> - src/gallium/drivers/swr/avx2/Makefile
>   src/gallium/drivers/trace/Makefile
>   src/gallium/drivers/vc4/Makefile
>   src/gallium/drivers/virgl/Makefile
> diff --git a/src/gallium/Makefile.am b/src/gallium/Makefile.am
> index 086e170..ef2bc10 100644
> --- a/src/gallium/Makefile.am
> +++ b/src/gallium/Makefile.am
> @@ -80,8 +80,6 @@ endif
> 
> if HAVE_GALLIUM_SWR
> SUBDIRS += drivers/swr
> -SUBDIRS += drivers/swr/avx
> -SUBDIRS += drivers/swr/avx2
> endif
> 
> ## vc4/rpi
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index f08806a..46e5f78 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -28,4 +28,96 @@ noinst_LTLIBRARIES = libmesaswr.la
> 
> libmesaswr_la_SOURCES = $(LOADER_SOURCES)
> 
> -EXTRA_DIST = Makefile.sources-arch
> +COMMON_CXXFLAGS = \
> + $(GALLIUM_DRIVER_CFLAGS) \
> + $(LLVM_CFLAGS) \
> + -I$(builddir)/rasterizer/scripts \
> + -I$(builddir)/rasterizer/jitter \
> + -I$(srcdir)/rasterizer \
> + -I$(srcdir)/rasterizer/core \
> + -I$(srcdir)/rasterizer/jitter
> +
> +COMMON_SOURCES = \
> + $(CXX_SOURCES) \
> + $(COMMON_CXX_SOURCES) \
> + $(CORE_CXX_SOURCES) \
> + $(JITTER_CXX_SOURCES) \
> + $(MEMORY_CXX_SOURCES) \
> + $(BUILT_SOURCES)
> +
> +BUILT_SOURCES = \
> + rasterizer/scripts/gen_knobs.cpp \
> + rasterizer/scripts/gen_knobs.h \
> + rasterizer/jitter/state_llvm.h \
> + rasterizer/jitter/builder_gen.h \
> + rasterizer/jitter/builder_gen.cpp \
> + rasterizer/jitter/builder_x86.h \
> + rasterizer/jitter/builder_x86.cpp
> +
> +rasterizer/scripts/gen_knobs.cpp rasterizer/scripts/gen_knobs.h: 
> rasterizer/scripts/gen_knobs.py rasterizer/scripts/knob_defs.py 
> rasterizer/scripts/templates/knobs.template
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/scripts/gen_knobs.py \
> + rasterizer/scripts
> +
> +rasterizer/jitter/state_llvm.h: rasterizer/jitter/scripts/gen_llvm_types.py 
> rasterizer/core/state.h
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/jitter/scripts/gen_llvm_types.py \
> + --input $(srcdir)/rasterizer/core/state.h \
> + --output rasterizer/jitter/state_llvm.h
> +
> +rasterizer/jitter/builder_gen.h: 
> rasterizer/jitter/scripts/gen_llvm_ir_macros.py 
> $(LLVM_INCLUDEDIR)/llvm/IR/IRBuilder.h
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/jitter/scripts/gen_llvm_ir_macros.py \
> + --input $(LLVM_INCLUDEDIR)/llvm/IR/IRBuilder.h \
> + --output rasterizer/jitter/builder_gen.h \
> + 

Re: [Mesa-dev] [PATCH v2] gallium/swr: fold the almost identical Makefiles

2016-04-13 Thread Rowley, Timothy O
Testing this, I needed to make the following change to install-gallium-links.mk 
to avoid a segfault while building.  If that’s the right approach, the 
egl_LTLIBRARIES line probably needs the same treatment for future proofing.

diff --git a/install-gallium-links.mk b/install-gallium-links.mk
index 4010cad..0aae905 100644
--- a/install-gallium-links.mk
+++ b/install-gallium-links.mk
@@ -14,7 +14,7 @@ all-local : .install-gallium-links
$(MKDIR_P) $$link_dir;  \
file_list=$(dri_LTLIBRARIES:%.la=.libs/%.so);   \
file_list+=$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
-   file_list+=$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
+   file_list+="$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)";   \
for f in $$file_list; do\
if test -h .libs/$$f; then  \
cp -d $$f $$link_dir;   \

> On Apr 13, 2016, at 9:38 AM, Emil Velikov  wrote:
> 
> From: Emil Velikov 
> 
> Rather than having two almost identical Makefiles, with various VPATH
> hacks just fold them, using COMMON_* variables and actually getting
> things buildable/shipable.
> 
> v2: whitespace fixes, remove Makefile.sources-arch
> 
> Cc: Tim Rowley 
> Signed-off-by: Emil Velikov 
> ---
> configure.ac  |   2 -
> src/gallium/Makefile.am   |   2 -
> src/gallium/drivers/swr/Makefile.am   |  94 +-
> src/gallium/drivers/swr/Makefile.sources  |  91 +
> src/gallium/drivers/swr/Makefile.sources-arch | 111 --
> src/gallium/drivers/swr/avx/Makefile.am   |  99 ---
> src/gallium/drivers/swr/avx2/Makefile.am  |  99 ---
> 7 files changed, 184 insertions(+), 314 deletions(-)
> delete mode 100644 src/gallium/drivers/swr/Makefile.sources-arch
> delete mode 100644 src/gallium/drivers/swr/avx/Makefile.am
> delete mode 100644 src/gallium/drivers/swr/avx2/Makefile.am
> 
> diff --git a/configure.ac b/configure.ac
> index c426c72..8c82c43 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -2487,8 +2487,6 @@ AC_CONFIG_FILES([Makefile
>   src/gallium/drivers/softpipe/Makefile
>   src/gallium/drivers/svga/Makefile
>   src/gallium/drivers/swr/Makefile
> - src/gallium/drivers/swr/avx/Makefile
> - src/gallium/drivers/swr/avx2/Makefile
>   src/gallium/drivers/trace/Makefile
>   src/gallium/drivers/vc4/Makefile
>   src/gallium/drivers/virgl/Makefile
> diff --git a/src/gallium/Makefile.am b/src/gallium/Makefile.am
> index 086e170..ef2bc10 100644
> --- a/src/gallium/Makefile.am
> +++ b/src/gallium/Makefile.am
> @@ -80,8 +80,6 @@ endif
> 
> if HAVE_GALLIUM_SWR
> SUBDIRS += drivers/swr
> -SUBDIRS += drivers/swr/avx
> -SUBDIRS += drivers/swr/avx2
> endif
> 
> ## vc4/rpi
> diff --git a/src/gallium/drivers/swr/Makefile.am 
> b/src/gallium/drivers/swr/Makefile.am
> index f08806a..d6d6e7d 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -28,4 +28,96 @@ noinst_LTLIBRARIES = libmesaswr.la
> 
> libmesaswr_la_SOURCES = $(LOADER_SOURCES)
> 
> -EXTRA_DIST = Makefile.sources-arch
> +COMMON_CXXFLAGS = \
> + $(GALLIUM_DRIVER_CFLAGS) \
> + $(LLVM_CFLAGS) \
> + -I$(builddir)/rasterizer/scripts \
> + -I$(builddir)/rasterizer/jitter \
> + -I$(srcdir)/rasterizer \
> + -I$(srcdir)/rasterizer/core \
> + -I$(srcdir)/rasterizer/jitter
> +
> +COMMON_SOURCES = \
> + $(CXX_SOURCES) \
> + $(COMMON_CXX_SOURCES) \
> + $(CORE_CXX_SOURCES) \
> + $(JITTER_CXX_SOURCES) \
> + $(MEMORY_CXX_SOURCES) \
> + $(BUILT_SOURCES)
> +
> +BUILT_SOURCES = \
> + rasterizer/scripts/gen_knobs.cpp \
> + rasterizer/scripts/gen_knobs.h \
> + rasterizer/jitter/state_llvm.h \
> + rasterizer/jitter/builder_gen.h \
> + rasterizer/jitter/builder_gen.cpp \
> + rasterizer/jitter/builder_x86.h \
> + rasterizer/jitter/builder_x86.cpp
> +
> +rasterizer/scripts/gen_knobs.cpp rasterizer/scripts/gen_knobs.h: 
> rasterizer/scripts/gen_knobs.py rasterizer/scripts/knob_defs.py 
> rasterizer/scripts/templates/knobs.template
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/scripts/gen_knobs.py \
> + rasterizer/scripts
> +
> +rasterizer/jitter/state_llvm.h: rasterizer/jitter/scripts/gen_llvm_types.py 
> rasterizer/core/state.h
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/jitter/scripts/gen_llvm_types.py \
> + --input $(srcdir)/rasterizer/core/state.h \
> + --output rasterizer/jitter/state_llvm.h
> +
> +rasterizer/jitter/builder_gen.h: 
> rasterizer/jitter/scripts/gen_llvm_ir_macros.py 
> $(LLVM_INCLUDEDIR)/llvm/IR/IRBuilder.h
> + $(PYTHON2) $(PYTHON_FLAGS) \
> + $(srcdir)/rasterizer/jitter/scripts/gen_llvm_ir_macros.py \
> + --input $(LLVM_INCLUDEDIR)/llvm/IR/IRBuil

Re: [Mesa-dev] [PATCH v2] gallium/swr: fold the almost identical Makefiles

2016-04-13 Thread Rowley, Timothy O

> On Apr 13, 2016, at 2:21 PM, Emil Velikov  wrote:
> 
> On 13 April 2016 at 19:13, Rowley, Timothy O  
> wrote:
>> Testing this, I needed to make the following change to 
>> install-gallium-links.mk to avoid a segfault while building.  If that’s the 
>> right approach, the egl_LTLIBRARIES line probably needs the same treatment 
>> for future proofing.
>> 
>> diff --git a/install-gallium-links.mk b/install-gallium-links.mk
>> index 4010cad..0aae905 100644
>> --- a/install-gallium-links.mk
>> +++ b/install-gallium-links.mk
>> @@ -14,7 +14,7 @@ all-local : .install-gallium-links
>>$(MKDIR_P) $$link_dir;  \
>>file_list=$(dri_LTLIBRARIES:%.la=.libs/%.so);   \
>>file_list+=$(egl_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
>> -   file_list+=$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*); \
>> +   file_list+="$(lib_LTLIBRARIES:%.la=.libs/%.$(LIB_EXT)*)";   \
>>for f in $$file_list; do\
>>if test -h .libs/$$f; then  \
>>cp -d $$f $$link_dir;   \
>> 
> Ahh yes. The whitespace will cause bash to [attempt to] execute the
> LTLIBRARIES and things will go crazy.
> Can you please send a patch that updates the whole file ?

Sent a patch along those lines to mesa-dev.

>>> On Apr 13, 2016, at 9:38 AM, Emil Velikov  wrote:
>>> 
>>> From: Emil Velikov 
>>> 
>>> Rather than having two almost identical Makefiles, with various VPATH
>>> hacks just fold them, using COMMON_* variables and actually getting
>>> things buildable/shipable.
>>> 
>>> v2: whitespace fixes, remove Makefile.sources-arch
>>> 
>>> Cc: Tim Rowley 
>>> Signed-off-by: Emil Velikov 
> I take it that I've not butched things with v2 and the r-b still holds ?

Yes, still stands.  Reviewed-by: Tim Rowley 

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: confine c++11 flag to swr driver

2016-04-15 Thread Rowley, Timothy O

> On Apr 15, 2016, at 10:33 AM, Jose Fonseca  wrote:
> 
> On 15/04/16 00:30, Tim Rowley wrote:
>> On the philosophy that a driver shouldn't change the compile flags
>> for the entire tree, take the clove approach of moving the c++11 flag
>> to the swr driver directory.
>> ---
>>  configure.ac|   9 +-
>>  m4/ax_cxx_compile_stdcxx.m4 | 558 
>> 
>>  src/gallium/drivers/swr/Makefile.am |   3 +-
>>  3 files changed, 9 insertions(+), 561 deletions(-)
>>  delete mode 100644 m4/ax_cxx_compile_stdcxx.m4
>> 
>> diff --git a/configure.ac b/configure.ac
>> index 8c82c43..6155942 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -2265,15 +2265,20 @@ if test -n "$with_gallium_drivers"; then
>>  fi
>>  ;;
>>  xswr)
>> -AX_CXX_COMPILE_STDCXX([11], [noext], [mandatory])
>>  swr_llvm_check "swr"
>> 
>> -AC_MSG_CHECKING([whether $CXX supports AVX/AVX2])
>> +AC_MSG_CHECKING([whether $CXX supports c++11/AVX/AVX2])
>>  AVX_CXXFLAGS="-march=core-avx-i"
>>  AVX2_CXXFLAGS="-march=core-avx2"
>> 
>>  AC_LANG_PUSH([C++])
>>  save_CXXFLAGS="$CXXFLAGS"
>> +CXXFLAGS="-std=c++11 $CXXFLAGS"
>> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
>> +  [AC_MSG_ERROR([c++11 compiler support not 
>> detected])])
>> +CXXFLAGS="$save_CXXFLAGS"
>> +
>> +save_CXXFLAGS="$CXXFLAGS"
>>  CXXFLAGS="$AVX_CXXFLAGS $CXXFLAGS"
>>  AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],[],
>>[AC_MSG_ERROR([AVX compiler support not 
>> detected])])
>> diff --git a/m4/ax_cxx_compile_stdcxx.m4 b/m4/ax_cxx_compile_stdcxx.m4
>> deleted file mode 100644
>> index 079e17d..000
>> --- a/m4/ax_cxx_compile_stdcxx.m4
>> +++ /dev/null
>> @@ -1,558 +0,0 @@
>> -# 
>> ===
>> -#   http://www.gnu.org/software/autoconf-archive/ax_cxx_compile_stdcxx.html
>> -# 
>> ===
>> -#
>> -# SYNOPSIS
>> -#
>> -#   AX_CXX_COMPILE_STDCXX(VERSION, [ext|noext], [mandatory|optional])
>> -#
>> -# DESCRIPTION
>> -#
>> -#   Check for baseline language coverage in the compiler for the specified
>> -#   version of the C++ standard.  If necessary, add switches to CXXFLAGS to
>> -#   enable support.  VERSION may be '11' (for the C++11 standard) or '14'
>> -#   (for the C++14 standard).
>> -#
>> -#   The second argument, if specified, indicates whether you insist on an
>> -#   extended mode (e.g. -std=gnu++11) or a strict conformance mode (e.g.
>> -#   -std=c++11).  If neither is specified, you get whatever works, with
>> -#   preference for an extended mode.
>> -#
>> -#   The third argument, if specified 'mandatory' or if left unspecified,
>> -#   indicates that baseline support for the specified C++ standard is
>> -#   required and that the macro should error out if no mode with that
>> -#   support is found.  If specified 'optional', then configuration proceeds
>> -#   regardless, after defining HAVE_CXX${VERSION} if and only if a
>> -#   supporting mode is found.
>> -#
>> -# LICENSE
>> -#
>> -#   Copyright (c) 2008 Benjamin Kosnik 
>> -#   Copyright (c) 2012 Zack Weinberg 
>> -#   Copyright (c) 2013 Roy Stogner 
>> -#   Copyright (c) 2014, 2015 Google Inc.; contributed by Alexey Sokolov 
>> 
>> -#   Copyright (c) 2015 Paul Norman 
>> -#   Copyright (c) 2015 Moritz Klammler 
>> -#
>> -#   Copying and distribution of this file, with or without modification, are
>> -#   permitted in any medium without royalty provided the copyright notice
>> -#   and this notice are preserved.  This file is offered as-is, without any
>> -#   warranty.
>> -
>> -#serial 1
>> -
>> -dnl  This macro is based on the code from the AX_CXX_COMPILE_STDCXX_11 macro
>> -dnl  (serial version number 13).
>> -
>> -AC_DEFUN([AX_CXX_COMPILE_STDCXX], [dnl
>> -  m4_if([$1], [11], [],
>> -[$1], [14], [],
>> -[$1], [17], [m4_fatal([support for C++17 not yet implemented in 
>> AX_CXX_COMPILE_STDCXX])],
>> -[m4_fatal([invalid first argument `$1' to 
>> AX_CXX_COMPILE_STDCXX])])dnl
>> -  m4_if([$2], [], [],
>> -[$2], [ext], [],
>> -[$2], [noext], [],
>> -[m4_fatal([invalid second argument `$2' to 
>> AX_CXX_COMPILE_STDCXX])])dnl
>> -  m4_if([$3], [], [ax_cxx_compile_cxx$1_required=true],
>> -[$3], [mandatory], [ax_cxx_compile_cxx$1_required=true],
>> -[$3], [optional], [ax_cxx_compile_cxx$1_required=false],
>> -[m4_fatal([invalid third argument `$3' to AX_CXX_COMPILE_STDCXX])])
>> -  AC_LANG_PUSH([C++])dnl
>> -  ac_success=no
>> -  AC_CACHE_CHECK(whether $CXX supports C++$1 features by default,
>> -  ax_cv_cxx_compile_cxx$1,
>> -  [AC_COMPILE_IFELSE([AC_LANG_SOURCE([_AX_CXX_COMPILE_STDCXX_testbody_$1])],
>> -[ax_cv_

Re: [Mesa-dev] [PATCH] swr: ignore generated files in rasterizer

2016-04-15 Thread Rowley, Timothy O

> On Apr 15, 2016, at 1:30 PM, Ilia Mirkin  wrote:
> 
> Signed-off-by: Ilia Mirkin 
> ---
> src/gallium/drivers/swr/.gitignore | 7 +++
> 1 file changed, 7 insertions(+)
> create mode 100644 src/gallium/drivers/swr/.gitignore
> 
> diff --git a/src/gallium/drivers/swr/.gitignore 
> b/src/gallium/drivers/swr/.gitignore
> new file mode 100644
> index 000..c5b6416
> --- /dev/null
> +++ b/src/gallium/drivers/swr/.gitignore
> @@ -0,0 +1,7 @@
> +rasterizer/jitter/builder_gen.cpp
> +rasterizer/jitter/builder_gen.h
> +rasterizer/jitter/builder_x86.cpp
> +rasterizer/jitter/builder_x86.h
> +rasterizer/jitter/state_llvm.h
> +rasterizer/scripts/gen_knobs.cpp
> +rasterizer/scripts/gen_knobs.h
> -- 
> 2.7.3
> 

Reviewed-by: Tim Rowley 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/11] swr: [rasterizer] Interpolation utility functions

2016-04-19 Thread Rowley, Timothy O

> On Apr 14, 2016, at 4:42 PM, Roland Scheidegger  wrote:
> 
> Am 14.04.2016 um 21:53 schrieb Tim Rowley:
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.h 
>> b/src/gallium/drivers/swr/rasterizer/core/frontend.h
>> index 8307c0b..12e7ae4 100644
>> --- a/src/gallium/drivers/swr/rasterizer/core/frontend.h
>> +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.h
>> @@ -307,6 +307,18 @@ bool CanUseSimplePoints(DRAW_CONTEXT *pDC)
>> !state.rastState.pointSpriteEnable);
>> }
>> 
>> +INLINE
>> +bool vIsNaN(const __m128& vec)
>> +{
>> +const __m128i& veci = _mm_castps_si128(vec);
>> +const __m128i fraction = _mm_and_si128(veci, 
>> _mm_set1_epi32(0x007f));
>> +const __m128i exponent = _mm_and_si128(veci, 
>> _mm_set1_epi32(0x7f80));
>> +__m128i result = _mm_cmpeq_epi32(exponent, _mm_set1_epi32(0));
>> +result = _mm_andnot_si128(_mm_cmpeq_epi32(fraction, _mm_set1_epi32(0)), 
>> result);
>> +int32_t mask = _mm_movemask_ps(_mm_castsi128_ps(result));
>> +return (mask > 0);
>> +}
> You could do this simpler by just doing abs on the source (which is a and)
> followed by a single _mm_cmpgt_epi32() against max exponent (0x7f80).
> Or do what lp_build_isnan does: just use _mm_cmp_ps with ordered/eq
> (using same source twice) and revert the bits. (Albeit I think we're not
> using the integer comparisons, which are nominally faster, in that code
> because we might have 8-wide vectors hence when avx but not avx2 isn't
> available this would be quite suboptimal.)
> That said, I'm actually wondering why not just doing a simple single
> unordered comparison, that should give the right result without having
> to invert the bits (though it's possible llvm does this on its own in
> the gallivm code).

Thanks for looking at this.  It turns out we use your suggested method in other 
areas of the code (VCMPPS_ISNAN in our jit builder, and ComputeNaNMask in the 
clipper).  An updated patch will be coming.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2016-02-17 Thread Rowley, Timothy O
> On Nov 18, 2015, at 12:34 PM, Emil Velikov  wrote:
> I have no objections against getting this merged, although here are a
> couple of things that should be sorted. Some of these are just
> reiteration from others:

Sorry about the delay responding to this; we’ve been working on a number of the 
issues you mentioned (plus the usual year-end holidays and other work).

> - First and foremost - please base your work against master. Mesa,
> alike most other open-source projects, tries to keep features out of
> bugfix releases. As such basing things against 11.0 is not suitable.

Basing our efforts on a particular Mesa branch was an initial development 
decision to keep a stable base while we figured out how to build a driver from 
scratch.  We have now rebased to the Mesa master and periodically merge updates.

> - Further combinatorial explosion of build configurations - with
> internal/external core, swr-arch, etc. Some of these can (should?) be
> nuked, although further comments will follow as patch(es) hit the
> mailing list.

All the additional swr build options have been removed, leaving swr simply as 
an additional gallium driver that can be enabled.  The build-time architecture 
dependence has been addressed by building the swr driver twice (avx and avx2), 
and having swr_create_screen check the architecture and load the appropriate 
library.  I’m not completely satisfied with the current solution as since the 
driver is part of the loaded library we need to link most of mesa into the 
“driver”.  The fix for this seems to be to just build the core swr rasterizer 
architecture specific and dlopen/dlsym the fifty or so API entry points.  
However this interim solution simplifies things for our users and removes the 
swr specific options from the general Mesa build system. 

> - Using llvm's C++ interface, building against multiple LLVM
> versions. If openswr only supports only limited versions of llvm, then
> the build should bail out accordingly - more comments/suggestions as
> patch(es) hit the ML.

OpenSWR now supports llvm 3.6, 3.7, and 3.8.  We don’t explicitly prevent 
people from trying to use llvm-svn, though as you say the C++ api is not stable 
so they might encounter problems.

> - Will patches porting core openswr functionality from the internal
> tree be part of the public discussions ? The VMWare people have done a
> great thing trying to keep things open, and people have, on the rare
> occasion, found nitpicks in their patches.

Moving patches from the internal rasterizer tree can be scripted at a top 
level, but unfortunately that’s the easy bit of keeping the two in sync when 
changes happen on both sides of the fence.  I can try tracking individual 
patches up to my git knowledge.

> - And last but not least - please split patches sensibly, for your
> submission and further work). The "Initial public Mesa+SWR" touches
> files in quite a few different places.

I’m about to send the patches to the list for review; splitting them into the 
driver, rasterizer, mesa changes, and build system.

> Mildly related - I'll be resending/merging a series with reworks
> things in src/gallium/auxiliary/target-helpers/ so things might clash
> as you rebase your work.

No problem - all part of working with a larger project.  Thanks for the 
heads-up.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] Mesa changes for adding OpenSWR

2016-02-18 Thread Rowley, Timothy O
Switching the default renderer to swr was part of our development so that we 
could get users to test our driver without accidentally forgetting to change 
the driver and ending up with llvmpipe.  I was thinking we might be able to 
leave that change in since swr isn’t in the default automake gallium driver 
list, but forgot that scons builds everything by default.  I’ll make that 
change and split this patch into the two pieces as requested.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] OpenSWR build changes

2016-02-22 Thread Rowley, Timothy O
Thanks for taking the time to dig into this patch.  Figured I’d address a few 
of the comments now, and work on all your points for the next revision.

> On Feb 22, 2016, at 12:53 PM, Emil Velikov  wrote:
> 
> On 18 February 2016 at 01:53, Tim Rowley  wrote:
> 
> Don't be shy to mention something in the commit message - which of the
> tree targets has been tested. Do they all build/work on linux,
> windows, BSD, other. Pretty much anything that you believe it's
> important. You might also include something that is known to not build
> properly (be that explicitly disabled atm or not).

Will do.  Preview: we have tested only on Linux (CentOS, Ubuntu, and SuSE 
based) and have started working towards working on Windows, but are aware that 
it is not currently working (thus disabled).

>> diff --git a/scons/custom.py b/scons/custom.py
>> index 043793b..a5a3410 100644
>> --- a/scons/custom.py
>> +++ b/scons/custom.py
>> @@ -132,7 +132,7 @@ def code_generate(env, script, target, source, command):
>> script_src = env.File(script).srcnode()
>> 
>> # This command creates generated code *in the build directory*.
>> -command = command.replace('$SCRIPT', script_src.path)
>> +command = command.replace('$SCRIPT', script_src.rstr())
> Looks like unrelated change which I'd split out in separate patch.

This change was needed to get the scons variant_dir approach of building a 
directory twice to work.

>> diff --git a/scons/llvm.py b/scons/llvm.py
>> index 1fc8a3f..30742ff 100644
>> --- a/scons/llvm.py
>> +++ b/scons/llvm.py
> 
>> @@ -128,7 +129,8 @@ def generate(env):
>> 'LLVMX86Info', 'LLVMX86AsmPrinter', 'LLVMX86Utils',
>> 'LLVMMCJIT', 'LLVMTarget', 'LLVMExecutionEngine',
>> 'LLVMRuntimeDyld', 'LLVMObject', 'LLVMMCParser',
>> -'LLVMBitReader', 'LLVMMC', 'LLVMCore', 'LLVMSupport'
>> +'LLVMBitReader', 'LLVMMC', 'LLVMCore', 'LLVMSupport',
>> +'LLVMIRReader', 'LLVMAsmParser', 'LLVMX86AsmParser'
> I'm thinking that we'll need these (and perhaps the irreader below)
> for the automake/conf build as well. Set --disable-llvm-shared-libs
> and watch things burn ;-)

Those changes were done for our windows port attempt.  I have noticed the 
reliance on the unified llvm shared lib on linux.

>> --- a/src/gallium/SConscript
>> +++ b/src/gallium/SConscript
>> @@ -20,6 +20,9 @@ SConscript([
>> 'drivers/trace/SConscript',
>> ])
>> 
>> +if env['platform'] != 'windows':
>> +   SConscript('drivers/swr/SConscript')
>> +
> I take it that you're planning to change this at some point in the
> future ? Some of the scon files below explicitly check against msvc.

Yes, this was temporary to disable a configuration which was known broken.

>> --- /dev/null
>> +++ b/src/gallium/drivers/swr/.clang-format
> Not a buildsystem file/change, but it's fine ;-)

Yes, slipped in my splitting of the patchset and should have gone with the 
openswr driver source.

>> diff --git a/src/gallium/drivers/swr/Makefile.am 
>> b/src/gallium/drivers/swr/Makefile.am
>> new file mode 100644
>> index 000..f3a4321
>> --- /dev/null
>> +++ b/src/gallium/drivers/swr/Makefile.am
>> @@ -0,0 +1,37 @@
>> +# Copyright (C) 2015 Intel Corporation.   All Rights Reserved.
>> +#
>> +# Permission is hereby granted, free of charge, to any person obtaining a
>> +# copy of this software and associated documentation files (the "Software"),
>> +# to deal in the Software without restriction, including without limitation
>> +# the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> +# and/or sell copies of the Software, and to permit persons to whom the
>> +# Software is furnished to do so, subject to the following conditions:
>> +#
>> +# The above copyright notice and this permission notice (including the next
>> +# paragraph) shall be included in all copies or substantial portions of the
>> +# Software.
>> +#
>> +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
>> DEALINGS
>> +# IN THE SOFTWARE.
>> +
>> +AUTOMAKE_OPTIONS = subdir-objects
>> +
> We don't need this. It's already enabled globally.

That was most likely copied from another driver, but it looks like we’re the 
holdouts now.

>> diff --git a/src/gallium/drivers/swr/Makefile.sources 
>> b/src/gallium/drivers/swr/Makefile.sources
>> new file mode 100644
>> index 000..4317a85
>> --- /dev/null
>> +++ b/src/gallium/drivers/swr/Makefile.sources
>> @@ -0,0 +1,23 @@
>> +# Copyright (C) 2015 Intel Corporation.   All Rights Reserved.
>> +#
>> +# Permission is hereby granted, free of c

Re: [Mesa-dev] Introducing OpenSWR: High performance software rasterizer

2016-02-22 Thread Rowley, Timothy O

> On Feb 17, 2016, at 7:07 PM, Roland Scheidegger  wrote:
> 
> You could use different functions for avx and avx2 code, and plug the
> right ones in at runtime, as you can link them both just fine. It just
> requires that your code containing avx2 code is in a different compile
> unit to the one containing avx-only code. This way you only really have
> separate compiled code for the functions where there's really a
> difference (obviously, this prevents the compiler from using avx2 on its
> own in the shared parts, but I doubt that's a problem). Albeit if you
> have lots of differences scattered around (the worst would probably be
> different structures based on such difference used everywhere...) this
> might not be very practical (at a first glance, didn't look like it at
> least for avx and avx2).
> Though I'm not actually sure how you would do that for c++ template
> code, maybe it doesn't work as easily...
> In any case, so far for llvmpipe we didn't bother (except for the jitted
> code of course) to optimize for newer instruction sets precisely due to
> it being annoying (certainly prevents you from doing "let's just
> optimize this math here in this little inline function when avx is
> available" - so we still have rasterization functions which emulate
> sse41 _mm_mul_epi32 with _mm_mul_epu32 and so on).

Unfortunately we have avx and avx2 usage in the general swr code, hidden behind 
some macros which emulate the missing avx2 instructions on avx, so there isn’t 
a clear boundary layer inside the swr rasterizer we can load behind.  
Additionally some of the structures will start changing size when we add avx512 
support.

I was thinking that “objcopy —prefix-symbols” might be the answer to the 
problem of creating two versions of the rasterizer that could be linked 
together with the driver, but it does a global rename on all symbols (internal 
and externals like malloc/free/c++ constructors/etc..) leaving unresolvable 
externals.

Maybe a global c++ namespace might work, but I don’t see a nonintrusive way of 
adding that.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 0/6] OpenSWR driver addition

2016-02-29 Thread Rowley, Timothy O
Modest ping: haven’t had any comments on these patches for a few days.

I don’t have freedesktop git write privileges, so once the patches are cleared 
it would be great if someone could push them.

Thanks.

-Tim

> On Feb 24, 2016, at 9:20 PM, Rowley, Timothy O  
> wrote:
> 
> Updating parts 3 through 6 (1 and 2 are still current) based on review
> comments.  Since we're only targeting linux at the moment, I've
> removed the scons build and libgl-gdi changes to simplify the review
> process.  In the future when we get windows working as well we'll get
> the scons build system changes ready for review.
> 
> Tim Rowley (6):
>  OpenSWR driver
>  OpenSWR rasterizer
>  gallium/auxilary: more __cplusplus exports
>  gallium/target-helpers: add OpenSWR driver
>  mesa/build: add OpenSWR to build
>  gallium/docs - add OpenSWR documentation
> 
> configure.ac   |   38 +
> m4/ax_cxx_compile_stdcxx.m4|  558 ++
> src/gallium/Makefile.am|6 +
> src/gallium/auxiliary/gallivm/lp_bld_flow.h|7 +
> src/gallium/auxiliary/gallivm/lp_bld_init.h|7 +
> src/gallium/auxiliary/gallivm/lp_bld_sample.h  |6 +
> src/gallium/auxiliary/gallivm/lp_bld_tgsi.h|8 +
> .../auxiliary/target-helpers/inline_sw_helper.h|   12 +-
> src/gallium/auxiliary/util/u_dl.h  |6 +
> src/gallium/docs/source/drivers/openswr.rst|   21 +
> src/gallium/docs/source/drivers/openswr/faq.rst|  141 +
> src/gallium/docs/source/drivers/openswr/knobs.rst  |  114 +
> .../docs/source/drivers/openswr/profiling.rst  |   67 +
> src/gallium/docs/source/drivers/openswr/usage.rst  |   38 +
> src/gallium/drivers/swr/.clang-format  |   64 +
> src/gallium/drivers/swr/Makefile.am|   31 +
> src/gallium/drivers/swr/Makefile.sources   |   23 +
> src/gallium/drivers/swr/Makefile.sources-arch  |  111 +
> src/gallium/drivers/swr/avx/Makefile.am|   99 +
> src/gallium/drivers/swr/avx2/Makefile.am   |   99 +
> .../drivers/swr/rasterizer/common/containers.hpp   |  208 +
> .../drivers/swr/rasterizer/common/formats.cpp  | 5469 
> .../drivers/swr/rasterizer/common/formats.h|  251 +
> src/gallium/drivers/swr/rasterizer/common/isa.hpp  |  235 +
> src/gallium/drivers/swr/rasterizer/common/os.h |  221 +
> .../swr/rasterizer/common/rdtsc_buckets.cpp|  188 +
> .../drivers/swr/rasterizer/common/rdtsc_buckets.h  |  229 +
> .../swr/rasterizer/common/rdtsc_buckets_shared.h   |  167 +
> .../drivers/swr/rasterizer/common/simdintrin.h |  787 +++
> .../drivers/swr/rasterizer/common/swr_assert.cpp   |  238 +
> .../drivers/swr/rasterizer/common/swr_assert.h |  109 +
> src/gallium/drivers/swr/rasterizer/core/api.cpp| 1511 ++
> src/gallium/drivers/swr/rasterizer/core/api.h  |  500 ++
> src/gallium/drivers/swr/rasterizer/core/arena.cpp  |  166 +
> src/gallium/drivers/swr/rasterizer/core/arena.h|   69 +
> .../drivers/swr/rasterizer/core/backend.cpp| 1899 +++
> src/gallium/drivers/swr/rasterizer/core/backend.h  |   59 +
> src/gallium/drivers/swr/rasterizer/core/blend.h|  318 ++
> src/gallium/drivers/swr/rasterizer/core/clip.cpp   |  201 +
> src/gallium/drivers/swr/rasterizer/core/clip.h |  868 
> src/gallium/drivers/swr/rasterizer/core/context.h  |  495 ++
> .../drivers/swr/rasterizer/core/depthstencil.h |  245 +
> src/gallium/drivers/swr/rasterizer/core/fifo.hpp   |  136 +
> .../swr/rasterizer/core/format_conversion.h|  196 +
> .../drivers/swr/rasterizer/core/format_traits.h| 3548 +
> .../drivers/swr/rasterizer/core/format_types.h | 1075 
> .../drivers/swr/rasterizer/core/frontend.cpp   | 2345 +
> src/gallium/drivers/swr/rasterizer/core/frontend.h |  327 ++
> src/gallium/drivers/swr/rasterizer/core/knobs.h|  142 +
> .../drivers/swr/rasterizer/core/knobs_init.h   |   98 +
> .../drivers/swr/rasterizer/core/multisample.cpp|   51 +
> .../drivers/swr/rasterizer/core/multisample.h  |  620 +++
> src/gallium/drivers/swr/rasterizer/core/pa.h   | 1208 +
> src/gallium/drivers/swr/rasterizer/core/pa_avx.cpp | 1177 +
> .../drivers/swr/rasterizer/core/rasterizer.cpp | 1393 +
> .../drivers/swr/rasterizer/core/rasterizer.h   |   35 +
> .../drivers/swr/rasterizer/core/rdtsc_core.cpp |   91 +
> .../drivers/swr/rasterizer/core/rdtsc_core.h   |  177 +
> src/gallium/drivers/swr/rasterizer/core/state.h| 1027 
> .../drivers/swr/rasterizer/core/tessellator.h  |   88 +
> .../drivers/swr/rasterizer/core/threads.cpp|  962 
> src/gallium/drivers/swr/rasteri

Re: [Mesa-dev] [PATCH v3 0/6] OpenSWR driver addition

2016-02-29 Thread Rowley, Timothy O

On Feb 29, 2016, at 3:47 PM, Roland Scheidegger 
mailto:srol...@vmware.com>> wrote:

Am 29.02.2016 um 22:07 schrieb Rowley, Timothy O:
Modest ping: haven’t had any comments on these patches for a few
days.
Patches look ok to me (for the parts I looked at and commented on).

I don’t have freedesktop git write privileges, so once the patches
are cleared it would be great if someone could push them.

I don't think that's going to work. A driver needs to be maintained, and
a complete driver where the maintainer doesn't have commit access sounds
like a bad idea to me. You should probably apply for git access, unless
you can find someone else who wants to work on the driver…

I’m willing to be the maintainer, but was trying to follow the path usually 
taken towards commit access: build up a history of patches and then ask for 
access.  I can request early access if you think this will help the process.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: explicity use llvm legacy FunctionPassManager

2016-03-03 Thread Rowley, Timothy O

> On Mar 3, 2016, at 11:54 AM, Kai Wasserbäch  
> wrote:
> 
> Tim Rowley wrote on 03.03.2016 18:20:
>> swr uses the legacy FunctionPassManager for llvm-3.6 compatibility,
>> but a change to llvm headers in 3.9 includes the new version as well.
>> Explicity use the legacy version to prevent ambiguity.
>> ---
>> src/gallium/drivers/swr/rasterizer/jitter/JitManager.h |  1 -
>> src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp|  8 +++-
>> src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp| 14 
>> --
>> .../drivers/swr/rasterizer/jitter/streamout_jit.cpp|  8 +++-
>> 4 files changed, 26 insertions(+), 5 deletions(-)
>> 
>> --- a/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
>> +++ b/src/gallium/drivers/swr/rasterizer/jitter/blend_jit.cpp
>> @@ -717,7 +717,13 @@ struct BlendJit : public Builder
>> 
>> JitManager::DumpToFile(blendFunc, "");
>> 
>> -FunctionPassManager passes(JM()->mpCurrentModule);
>> +#if LLVM_VERISON_MAJOR == 3 && LLVM_VERSION_MINOR == 6
> 
> Why not something like
> 
> #IF HAVE_LLVM == 0x0306
> 
> like radeonsi is using? (Same applies below.)

I like the cleanness of that method of version checking, but the internal 
customer of the swr rasterizer has a build system which doesn’t create a 
HAVE_LLVM, so we end up using only what’s provided by llvm’s llvm-config.h.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: fix issues preventing 32-bit build

2016-03-07 Thread Rowley, Timothy O

> On Mar 4, 2016, at 3:26 PM, Emil Velikov  wrote:
> 
> On 4 March 2016 at 19:28, Tim Rowley  wrote:
>> 
>> diff --git a/src/gallium/drivers/swr/rasterizer/common/os.h 
>> b/src/gallium/drivers/swr/rasterizer/common/os.h
>> index 736d298..522ae0d 100644
>> --- a/src/gallium/drivers/swr/rasterizer/common/os.h
>> +++ b/src/gallium/drivers/swr/rasterizer/common/os.h
>> @@ -81,7 +81,6 @@ typedef CARD8 BOOL;
>> typedef wchar_tWCHAR;
>> typedef uint16_t   UINT16;
>> typedef intINT;
>> -typedef int INT32;
>> typedef unsigned int   UINT;
>> typedef uint32_t   UINT32;
>> typedef uint64_t   UINT64;
> If you can remove this abstraction and use plain C types that will be
> amazing. With future commits of course.

There was a pass over the tree removing these types a while back, but 
unfortunately the typedefs remained and some uses creeped back in.  Working on 
cleaning this up.

-Tim
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: update rasterizer (532172)

2016-03-22 Thread Rowley, Timothy O

> On Mar 22, 2016, at 3:51 PM, Justen, Jordan L  
> wrote:
> 
> What does 532172 in the subject refer to?

swr rasterizer development happens in another source control system.  532172 is 
a revision id to checkpoint where we’ve pushed the changes publicly.

> From this commit message, it seems clear that this single patch is
> doing a whole lot. Usually that's a good sign that it should be split
> into multiple patches.
> 
> However, since this is only changing your driver, you can probably
> take any sort of patches that you like. :)
> 
> There is arguably little value to sending out a patch like this, since
> it is very difficult to review. In other words, perhaps if you are
> going to make big, unreviewable patches like this that only change
> your driver, then you might as well just push them straight away.
> 
> (But, it would be better, in my opinion, to try to split up the
> changes and let them be reviewed.)

Yes, there’s a lot in this patch.  I froze the public version of the rasterizer 
when I began the upstreaming process mid February, so this is syncing up with 
about a month’s worth of development.

I also have this change as a series of 81 commits.  Not sure if that would be 
preferable by the community or if people would be interested in reviewing the 
series, as issues with early commits might already be addressed later in the 
patch set.

-Tim


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: update rasterizer (532172)

2016-03-23 Thread Rowley, Timothy O

> On Mar 23, 2016, at 12:52 AM, Justen, Jordan L  
> wrote:
> 
> On 2016-03-22 20:55:10, Rowley, Timothy O wrote:
>> 
>> Yes, there’s a lot in this patch. I froze the public version of the
>> rasterizer when I began the upstreaming process mid February, so
>> this is syncing up with about a month’s worth of development.
>> 
>> I also have this change as a series of 81 commits. Not sure if that
>> would be preferable by the community or if people would be
>> interested in reviewing the series, as issues with early commits
>> might already be addressed later in the patch set.
> 
> There seems to be some things working against community code review.
> 
> * Expected broken commits earlier in the series (We would normally ask
>  that commits are cleaned up before posting them.)

There aren’t known broken commits in the series (besides the rare source 
control “whoops” which can happen with pretty much any development workflow, 
and which I can squash in the commit series pushed upstream), it’s just that 
with a backlog of changes improvements which might have been suggested for an 
early commit might already be addressed later in the series.

> * External development (What would happen to any code review asking
>  for reworks, given that the patches are already merged elsewhere?)

We’ve only had to deal with this in a limited fashion to this point (back when 
we were developing on github).  Depending on the impact of the suggested 
changes, I’d either do it myself or work with an internal developer to 
implement them, and amend the commit to include the changes.  Keeping the two 
repositories in sync is non-trivial, and I expect there might be times where a 
review for the rasterizer might have to addressed as “noted, will try to 
address in the future” if changes become stacked to where the commit history 
can’t be reordered/squashed before pushing.

> * A large backlog of changes. :)

I hope to do updates on a more regular basis; the upstreaming and follow-up on 
issues arising caused the long period this time and the resulting backlog.

> I still think it would be better to see the 81 commits split up in the
> history as long as they won't cause problems for others. Since most
> people are unlikely to be building openswr, I don't think the commits
> will affect them.

Once we sort out some more of the build issues we’d like to propose adding 
openswr to the default driver build list, but as you say for now people aren’t 
building this code unless they intentionally turned it on.

> We rarely use merges, but perhaps it is appropriate since openswr is
> developed externally. You could start a branch at the last openswr
> commit, add your 81 commits. Then you could merge the resulting branch
> into master.

I’ve been respecting the “no merges” mesa git workflow.  I have a script which 
can convert from our internal repository to git commits relative to master head.

- Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: update rasterizer (532172)

2016-03-23 Thread Rowley, Timothy O

> On Mar 23, 2016, at 12:52 AM, Kenneth Graunke  wrote:
> 
> That's an awkward situation we've not run into before.
> 
> If the code is going to live in the upstream Mesa git repository, then
> it seems like the best long term plan is to reverse the workflow: make
> upstream Mesa the canonical repository, do development upstream, and
> pull changes from upstream into any internal repositories.
> 
> Obviously, that's a huge process change - presumably you have a bunch
> of people working in some Intel perforce system - but working in the
> public is very beneficial.  It's also the mark of a true open source
> project, rather than simply "available source”.

While that situation would be nice, the swr rasterizer is a subset of an 
internal project, and what is upstreamed publicly is not just a straight copy 
of our repository.  Moving to having the rasterizer’s “home” to Mesa involves 
some large technical and workflow challenges.

-Tim

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium/swr: update rasterizer (532172)

2016-03-23 Thread Rowley, Timothy O

> On Mar 23, 2016, at 2:19 PM, Tom Stellard  wrote:
> 
> On Wed, Mar 23, 2016 at 04:53:29PM +0000, Rowley, Timothy O wrote:
>> 
>> While that situation would be nice, the swr rasterizer is a subset of an 
>> internal project, and what is upstreamed publicly is not just a straight 
>> copy of our repository.  Moving to having the rasterizer’s “home” to Mesa 
>> involves some large technical and workflow challenges.
> 
> How much testing do you do on the version of swr that's in Mesa?

The internal version undergoes extensive continuous integration testing.  The 
version in Mesa isn’t currently subjected to quite the same testing; we check 
for regressions on the VTK test suite regularly, do targeted testing on our 
major target applications, and currently infrequent piglit runs.

-Tim


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] swr: [rasterizer] Correctly select optimized primitive assembly.

2016-05-25 Thread Rowley, Timothy O
Reviewed-by: Tim Rowley 

> On May 24, 2016, at 3:00 PM, Bruce Cherniak  wrote:
> 
> Indexed primitives were always using cut-aware primitive assembly,
> whether primitive_restart was enabled or not.  Correctly pass down
> primitive_restart and select optimized PA when possible.
> ---
> src/gallium/drivers/swr/rasterizer/core/api.cpp|2 ++
> .../drivers/swr/rasterizer/core/frontend.cpp   |6 --
> src/gallium/drivers/swr/rasterizer/core/frontend.h |1 +
> src/gallium/drivers/swr/rasterizer/core/pa.h   |4 ++--
> src/gallium/drivers/swr/rasterizer/core/state.h|3 ++-
> src/gallium/drivers/swr/swr_draw.cpp   |6 ++
> src/gallium/drivers/swr/swr_state.cpp  |4 
> 7 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/api.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/api.cpp
> index 8e0c1e1..2e6f8b3 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/api.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/api.cpp
> @@ -1069,6 +1069,7 @@ void DrawInstanced(
> pDC->FeWork.type = DRAW;
> pDC->FeWork.pfnWork = GetProcessDrawFunc(
> false,  // IsIndexed
> +false, // bEnableCutIndex
> pState->tsState.tsEnable,
> pState->gsState.gsEnable,
> pState->soState.soEnable,
> @@ -1202,6 +1203,7 @@ void DrawIndexedInstance(
> pDC->FeWork.type = DRAW;
> pDC->FeWork.pfnWork = GetProcessDrawFunc(
> true,   // IsIndexed
> +pState->frontendState.bEnableCutIndex,
> pState->tsState.tsEnable,
> pState->gsState.gsEnable,
> pState->soState.soEnable,
> diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp 
> b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
> index d6643c6..ef90a24 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
> +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
> @@ -1159,6 +1159,7 @@ static void TessellationStages(
> /// @param pUserData - Pointer to DRAW_WORK
> template <
> typename IsIndexedT,
> +typename IsCutIndexEnabledT,
> typename HasTessellationT,
> typename HasGeometryShaderT,
> typename HasStreamOutT,
> @@ -1283,7 +1284,7 @@ void ProcessDraw(
> }
> 
> // choose primitive assembler
> -PA_FACTORY paFactory(pDC, state.topology, work.numVerts);
> +PA_FACTORY paFactory(pDC, 
> state.topology, work.numVerts);
> PA_STATE& pa = paFactory.GetPA();
> 
> /// @todo: temporarily move instance loop in the FE to ensure SO ordering
> @@ -1434,12 +1435,13 @@ struct FEDrawChooser
> // Selector for correct templated Draw front-end function
> PFN_FE_WORK_FUNC GetProcessDrawFunc(
> bool IsIndexed,
> +bool IsCutIndexEnabled,
> bool HasTessellation,
> bool HasGeometryShader,
> bool HasStreamOut,
> bool HasRasterization)
> {
> -return TemplateArgUnroller::GetFunc(IsIndexed, 
> HasTessellation, HasGeometryShader, HasStreamOut, HasRasterization);
> +return TemplateArgUnroller::GetFunc(IsIndexed, 
> IsCutIndexEnabled, HasTessellation, HasGeometryShader, HasStreamOut, 
> HasRasterization);
> }
> 
> 
> diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.h 
> b/src/gallium/drivers/swr/rasterizer/core/frontend.h
> index e1b0400..dfd3987 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/frontend.h
> +++ b/src/gallium/drivers/swr/rasterizer/core/frontend.h
> @@ -322,6 +322,7 @@ uint32_t NumVertsPerPrim(PRIMITIVE_TOPOLOGY topology, 
> bool includeAdjVerts);
> // ProcessDraw front-end function.  All combinations of parameter values are 
> available
> PFN_FE_WORK_FUNC GetProcessDrawFunc(
> bool IsIndexed,
> +bool IsCutIndexEnabled,
> bool HasTessellation,
> bool HasGeometryShader,
> bool HasStreamOut,
> diff --git a/src/gallium/drivers/swr/rasterizer/core/pa.h 
> b/src/gallium/drivers/swr/rasterizer/core/pa.h
> index c98ea14..6aa73c1 100644
> --- a/src/gallium/drivers/swr/rasterizer/core/pa.h
> +++ b/src/gallium/drivers/swr/rasterizer/core/pa.h
> @@ -1149,14 +1149,14 @@ private:
> 
> // Primitive Assembler factory class, responsible for creating and 
> initializing the correct assembler
> // based on state.
> -template 
> +template 
> struct PA_FACTORY
> {
> PA_FACTORY(DRAW_CONTEXT* pDC, PRIMITIVE_TOPOLOGY in_topo, uint32_t 
> numVerts) : topo(in_topo)
> {
> #if KNOB_ENABLE_CUT_AWARE_PA == TRUE
> const API_STATE& state = GetApiState(pDC);
> -if ((IsIndexedT::value && (
> +if ((IsIndexedT::value && IsCutIndexEnabledT::value && (
> topo == TOP_TRIANGLE_STRIP || topo == TOP_POINT_LIST ||
> topo == TOP_LINE_LIST || topo == TOP_LINE_STRIP ||
> topo == TOP_TRIANGLE_LIST || topo == TOP_LINE_LIST_ADJ ||
> diff --git a/src/gallium/drivers/swr/rasterizer/core/state.h 
> b/src/gallium/drivers/swr/rasterizer/core/state.h
> index f4813e4..5156c6b 100644
> -

Re: [Mesa-dev] [PATCH v3] swr: implement clipPlanes/clipVertex/clipDistance/cullDistance

2016-05-26 Thread Rowley, Timothy O

> On May 25, 2016, at 9:16 PM, Ilia Mirkin  wrote:
> 
> On Wed, May 25, 2016 at 10:03 PM, Tim Rowley  
> wrote:
>> v2: only load the clip vertex once
>> 
>> v3: fix clip enable logic, add cullDistance
>> ---
>> docs/GL3.txt   |  2 +-
>> src/gallium/drivers/swr/swr_context.h  |  2 ++
>> src/gallium/drivers/swr/swr_screen.cpp |  3 +-
>> src/gallium/drivers/swr/swr_shader.cpp | 63 
>> ++
>> src/gallium/drivers/swr/swr_shader.h   |  4 +++
>> src/gallium/drivers/swr/swr_state.cpp  | 24 -
>> 6 files changed, 95 insertions(+), 3 deletions(-)
>> 
>> diff --git a/docs/GL3.txt b/docs/GL3.txt
>> index 555a9be..5965f25 100644
>> --- a/docs/GL3.txt
>> +++ b/docs/GL3.txt
>> @@ -211,7 +211,7 @@ GL 4.5, GLSL 4.50:
>>   GL_ARB_ES3_1_compatibilityDONE (nvc0, radeonsi)
>>   GL_ARB_clip_control   DONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>>   GL_ARB_conditional_render_invertedDONE (i965, nv50, 
>> nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
>> -  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe)
>> +  GL_ARB_cull_distance  DONE (i965, nv50, 
>> nvc0, llvmpipe, softpipe, swr)
>>   GL_ARB_derivative_control DONE (i965, nv50, 
>> nvc0, r600, radeonsi)
>>   GL_ARB_direct_state_accessDONE (all drivers)
>>   GL_ARB_get_texture_sub_image  DONE (all drivers)
>> diff --git a/src/gallium/drivers/swr/swr_context.h 
>> b/src/gallium/drivers/swr/swr_context.h
>> index a7383bb..75ecae3 100644
>> --- a/src/gallium/drivers/swr/swr_context.h
>> +++ b/src/gallium/drivers/swr/swr_context.h
>> @@ -89,6 +89,8 @@ struct swr_draw_context {
>>swr_jit_texture texturesFS[PIPE_MAX_SHADER_SAMPLER_VIEWS];
>>swr_jit_sampler samplersFS[PIPE_MAX_SAMPLERS];
>> 
>> +   float userClipPlanes[PIPE_MAX_CLIP_PLANES][4];
>> +
>>SWR_SURFACE_STATE renderTargets[SWR_NUM_ATTACHMENTS];
>> };
>> 
>> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
>> b/src/gallium/drivers/swr/swr_screen.cpp
>> index 0772274..7851346 100644
>> --- a/src/gallium/drivers/swr/swr_screen.cpp
>> +++ b/src/gallium/drivers/swr/swr_screen.cpp
>> @@ -333,6 +333,8 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
>>case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
>>   return 1;
>> +   case PIPE_CAP_CULL_DISTANCE:
>> +  return 1;
>>case PIPE_CAP_TGSI_TXQS:
>>case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
>>case PIPE_CAP_SHAREABLE_SHADERS:
>> @@ -358,7 +360,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
>> param)
>>case PIPE_CAP_PCI_DEVICE:
>>case PIPE_CAP_PCI_FUNCTION:
>>case PIPE_CAP_FRAMEBUFFER_NO_ATTACHMENT:
>> -   case PIPE_CAP_CULL_DISTANCE:
>>case PIPE_CAP_PRIMITIVE_RESTART_FOR_PATCHES:
>>   return 0;
>>}
>> diff --git a/src/gallium/drivers/swr/swr_shader.cpp 
>> b/src/gallium/drivers/swr/swr_shader.cpp
>> index f693f51..25ea7ae 100644
>> --- a/src/gallium/drivers/swr/swr_shader.cpp
>> +++ b/src/gallium/drivers/swr/swr_shader.cpp
>> @@ -40,6 +40,9 @@
>> #include "swr_state.h"
>> #include "swr_screen.h"
>> 
>> +static unsigned
>> +locate_linkage(ubyte name, ubyte index, struct tgsi_shader_info *info);
>> +
>> bool operator==(const swr_jit_fs_key &lhs, const swr_jit_fs_key &rhs)
>> {
>>return !memcmp(&lhs, &rhs, sizeof(lhs));
>> @@ -120,6 +123,11 @@ swr_generate_vs_key(struct swr_jit_vs_key &key,
>> {
>>memset(&key, 0, sizeof(key));
>> 
>> +   key.clip_plane_mask = ctx->rasterizer->clip_plane_enable;
>> +   key.clip_distance_mask = swr_vs->info.base.clipdist_writemask;
>> +   key.cull_distance_mask = swr_vs->info.base.culldist_writemask;
>> +   key.writes_clipvertex = swr_vs->info.base.writes_clipvertex;
>> +
>>swr_generate_sampler_key(swr_vs->info, ctx, PIPE_SHADER_VERTEX, key);
>> }
>> 
>> @@ -252,6 +260,61 @@ BuilderSWR::CompileVS(struct swr_context *ctx, 
>> swr_jit_vs_key &key)
>>   }
>>}
>> 
>> +   if (ctx->rasterizer->clip_plane_enable) {
> 
> I think you want if (ctx->rasterizer->clip_plane_enable &&
> (swr_vs->info.base.clipdist_writemask |
> swr_vs->info.base.culldist_writemask) == 0)
> 
> Note that for culling, clip_plane_enable won't be set. That's only for
> clip planes and cull distances.
> 

I think the test actually needs to be "if (ctx->rasterizer->clip_plane_enable 
|| swr_vs->info.base.culldist_writemask)” since I need to do the output 
rewiring for clip and cull.

>> +  unsigned clip_mask = ctx->rasterizer->clip_plane_enable;
>> +
>> +  unsigned cv;
>> +  if (swr_vs->info.base.writes_clipvertex) {
>> + cv = 1 + locate_linkage(TGSI_SEMANTIC_CLIPVERTEX, 0,
>> + &swr_vs->info.base);
>> +  } else {
>> + for (int i = 0; i < PIPE_MAX_SHADER_

  1   2   >