I know this is an old thread. I completely missed it the first time,
but recently rediscovered after reading
http://www.phoronix.com/scan.php?page=news_item&px=Vulkan-CPU-Repository
, and perhaps it's not too late for a couple comments FWIW.
On 13/02/17 02:17, Jacob Lifshay wrote:
forgot to add mesa-dev when I sent.
---------- Forwarded message ----------
From: "Jacob Lifshay" <programmerj...@gmail.com
<mailto:programmerj...@gmail.com>>
Date: Feb 12, 2017 6:16 PM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Dave Airlie" <airl...@gmail.com <mailto:airl...@gmail.com>>
Cc:
On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com
<mailto:airl...@gmail.com>> wrote:
> I'm assuming that control barriers in Vulkan are identical to
barriers
> across a work-group in opencl. I was going to have a work-group
be a single
> OS thread, with the different work-items mapped to SIMD lanes. If
we need to
> have additional scheduling, I have written a javascript compiler that
> supports generator functions, so I mostly know how to write a
llvm pass to
> implement that. I was planning on writing the shader compiler
using llvm,
> using the whole-function-vectorization pass I will write, and
using the
> pre-existing spir-v to llvm translation layer. I would also write
some llvm
> passes to translate from texture reads and stuff to basic vector ops.
Well the problem is number of work-groups that gets launched could be
quite high, and this can cause a large overhead in number of host
threads
that have to be launched. There was some discussion on this in mesa-dev
archives back when I added softpipe compute shaders.
I would start a thread for each cpu, then have each thread run the
compute shader a number of times instead of having a thread per shader
invocation.
At least for llvmpipe, last time I looked into this, using OS green
threads seemed a simple non-intrusive method of dealing with this --
https://lists.freedesktop.org/archives/mesa-dev/2016-April/114790.html
-- but it sounds like LLVM coroutines can handle this more effectively.
> I have a prototype rasterizer, however I haven't implemented
binning for
> triangles yet or implemented interpolation. currently, it can handle
> triangles in 3D homogeneous and calculate edge equations.
> https://github.com/programmerjake/tiled-renderer
<https://github.com/programmerjake/tiled-renderer>
> A previous 3d renderer that doesn't implement any vectorization
and has
> opengl 1.x level functionality:
>
https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
<https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
Well I think we already have a completely fine rasterizer and binning
and whatever
else in the llvmpipe code base. I'd much rather any Mesa based
project doesn't
throw all of that away, there is no reason the same swrast backend
couldn't
be abstracted to be used for both GL and Vulkan and introducing another
just because it's interesting isn't a great fit for long term project
maintenance..
If there are improvements to llvmpipe that need to be made, then that
is something
to possibly consider, but I'm not sure why a swrast vulkan needs a
from scratch
raster implemented. For a project that is so large in scope, I'd think
reusing that code
would be of some use. Since most of the fun stuff is all the texture
sampling etc.
I actually think implementing the rasterization algorithm is the best
part. I wanted the rasterization algorithm to be included in the
shaders, eg. triangle setup and binning would be tacked on to the end of
the vertex shader and parameter interpolation and early z tests would be
tacked on to the beginning of the fragment shader and blending on to the
end. That way, llvm could do more specialization and instruction
scheduling than is possible in llvmpipe now.
Parameter interpolation, early z test, and blending *is* tacked to
llmvpipe's fragment shaders.
I don't see how to effectively tack triangle setup into the vertex
shader: vertex shader applies to vertices, where as triangle setup and
bining applies to primitives. Usually, each vertex gets transformed
only once with llvmpipe, no matter how many triangles refer that vertex.
The only way to tack triangle setup into vertex shading would be if
you processed vertices a primitive at a time. Of course one could put
an if-statement to skip reprocessing a vertex that already was
processed, but then you have race conditions, and no benefit of inlining.
And I'm afraid that tacking rasterization too is one those things that
sound great on paper, quite bad in practice. And I speak from
experience: in fact llvmpipe had the last step of rasterization bolted
on the fragment shaders for some time. But we took it out because it
was _slower_.
The issue is that if you bolt on to the shader body, you either:
- inline in the shader body code for the maxmimum number of planes that
(which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), and
waste cpu cicles going through all of those tests, even when most of the
time many of those tests aren't needed
- or you generate if/for blocks for each place, so you only do the
needed tests, but then you have branch prediction issues...
Whereas if you keep rasterization _outside_ the shader you can have
specialized functions to do the rasterization based on the primitive
itself: (is the triangle fully inside the scissor, you need 3 planes, if
the stamp is fully inside the triangle you need zero). Essentially you
can "compose" by coupling two functions calls: you call a rasterization
function that's especiallized for the primitive, then a shading function
that's specialized for the state (but not depends on the primitive).
It makes sense: rasterization needs to be specialized for the primitive,
not the graphics state; where as the shader needs to be specialized for
the state.
And this is just one of those non-intuitive things that's not obvious
until one actually does a lot of profiling, a lot of experimentation.
And trust me, lot of time was spent fine tuning this for llvmpipe (not
be me -- most of rasterization was done by Keith Whitwell.) And by
throwing llvmpipe out of the window and starting a new software
rendering from scratch you'd be just subscribing to do it all over again.
Whereas if instead of starting from scratch, you take llvmpipe, and you
rewrite/replace one component at a time, you can reach exactly the same
destination you want to reach, however you'll have something working
every step of the way, so when you take a bad step, you can measure
performance impact, and readjust. Plus if you run out of time, you have
something useful -- not yet another half finished project, which quickly
will rot away.
Regarding generating the spir-v -> scalar llvm, then do whole function
vectorization, I don't think it's a bad idea per se. If was I writing
llvmpipe from scratch today I'd do something like that. Especially
because (scalar) LLVM IR is so pervasive in the graphics ecosistem anyway.
It was only after I had tgsi -> llvm ir all done that I stumbled into
http://compilers.cs.uni-saarland.de/projects/wfv/ .
I think the important thing here is that, once you've vectorized the
shader, and you converted your "texture_sample" to
"texture_sample.vector8", and your "output_merger" intrinsics to
"output_merger.vector8", or you log2/exp2, you then slot the fine tuned
llvmpipe code for texture sampling and blending and math, as that's were
your bottle necks tend to be. Because if you plan to write all texture
sampling from scratch then you need a time/clone machine to complete
this in a summer; and if just use LLVM's / standard C runtime's
sqrt/log2/exp2/sin/cos then it would be dead slow.
Anyway, I hope this helps. Best of luck.
Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev