Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

Jose Fonseca Sat, 10 Jun 2017 17:57:43 -0700

I know this is an old thread. I completely missed it the first time,but recently rediscovered after readinghttp://www.phoronix.com/scan.php?page=news_item&px=Vulkan-CPU-Repository, and perhaps it's not too late for a couple comments FWIW.


On 13/02/17 02:17, Jacob Lifshay wrote:

forgot to add mesa-dev when I sent.
---------- Forwarded message ----------

From: "Jacob Lifshay" <programmerj...@gmail.com<mailto:programmerj...@gmail.com>>

Date: Feb 12, 2017 6:16 PM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Dave Airlie" <airl...@gmail.com <mailto:airl...@gmail.com>>
Cc:

On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com<mailto:airl...@gmail.com>> wrote:


     > I'm assuming that control barriers in Vulkan are identical to
    barriers
     > across a work-group in opencl. I was going to have a work-group
    be a single
     > OS thread, with the different work-items mapped to SIMD lanes. If
    we need to
     > have additional scheduling, I have written a javascript compiler that
     > supports generator functions, so I mostly know how to write a
    llvm pass to
     > implement that. I was planning on writing the shader compiler
    using llvm,
     > using the whole-function-vectorization pass I will write, and
    using the
     > pre-existing spir-v to llvm translation layer. I would also write
    some llvm
     > passes to translate from texture reads and stuff to basic vector ops.

    Well the problem is number of work-groups that gets launched could be
    quite high, and this can cause a large overhead in number of host
    threads
    that have to be launched. There was some discussion on this in mesa-dev
    archives back when I added softpipe compute shaders.

I would start a thread for each cpu, then have each thread run thecompute shader a number of times instead of having a thread per shaderinvocation.

At least for llvmpipe, last time I looked into this, using OS greenthreads seemed a simple non-intrusive method of dealing with this --

https://lists.freedesktop.org/archives/mesa-dev/2016-April/114790.html
-- but it sounds like LLVM coroutines can handle this more effectively.



     > I have a prototype rasterizer, however I haven't implemented
    binning for
     > triangles yet or implemented interpolation. currently, it can handle
     > triangles in 3D homogeneous and calculate edge equations.
     > https://github.com/programmerjake/tiled-renderer
    <https://github.com/programmerjake/tiled-renderer>
     > A previous 3d renderer that doesn't implement any vectorization
    and has
     > opengl 1.x level functionality:
     >
    https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
    <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>

    Well I think we already have a completely fine rasterizer and binning
    and whatever
    else in the llvmpipe code base. I'd much rather any Mesa based
    project doesn't
    throw all of that away, there is no reason the same swrast backend
    couldn't
    be abstracted to be used for both GL and Vulkan and introducing another
    just because it's interesting isn't a great fit for long term project
    maintenance..

    If there are improvements to llvmpipe that need to be made, then that
    is something
    to possibly consider, but I'm not sure why a swrast vulkan needs a
    from scratch
    raster implemented. For a project that is so large in scope, I'd think
    reusing that code
    would be of some use. Since most of the fun stuff is all the texture
    sampling etc.

I actually think implementing the rasterization algorithm is the bestpart. I wanted the rasterization algorithm to be included in theshaders, eg. triangle setup and binning would be tacked on to the end ofthe vertex shader and parameter interpolation and early z tests would betacked on to the beginning of the fragment shader and blending on to theend. That way, llvm could do more specialization and instructionscheduling than is possible in llvmpipe now.

Parameter interpolation, early z test, and blending *is* tacked tollmvpipe's fragment shaders.

I don't see how to effectively tack triangle setup into the vertexshader: vertex shader applies to vertices, where as triangle setup andbining applies to primitives. Usually, each vertex gets transformedonly once with llvmpipe, no matter how many triangles refer that vertex.The only way to tack triangle setup into vertex shading would be ifyou processed vertices a primitive at a time. Of course one could putan if-statement to skip reprocessing a vertex that already wasprocessed, but then you have race conditions, and no benefit of inlining.

And I'm afraid that tacking rasterization too is one those things thatsound great on paper, quite bad in practice. And I speak fromexperience: in fact llvmpipe had the last step of rasterization boltedon the fragment shaders for some time. But we took it out because itwas _slower_.


The issue is that if you bolt on to the shader body, you either:

- inline in the shader body code for the maxmimum number of planes that(which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), andwaste cpu cicles going through all of those tests, even when most of thetime many of those tests aren't needed

- or you generate if/for blocks for each place, so you only do theneeded tests, but then you have branch prediction issues...

Whereas if you keep rasterization _outside_ the shader you can havespecialized functions to do the rasterization based on the primitiveitself: (is the triangle fully inside the scissor, you need 3 planes, ifthe stamp is fully inside the triangle you need zero). Essentially youcan "compose" by coupling two functions calls: you call a rasterizationfunction that's especiallized for the primitive, then a shading functionthat's specialized for the state (but not depends on the primitive).

It makes sense: rasterization needs to be specialized for the primitive,not the graphics state; where as the shader needs to be specialized forthe state.

And this is just one of those non-intuitive things that's not obviousuntil one actually does a lot of profiling, a lot of experimentation.And trust me, lot of time was spent fine tuning this for llvmpipe (notbe me -- most of rasterization was done by Keith Whitwell.) And bythrowing llvmpipe out of the window and starting a new softwarerendering from scratch you'd be just subscribing to do it all over again.

Whereas if instead of starting from scratch, you take llvmpipe, and yourewrite/replace one component at a time, you can reach exactly the samedestination you want to reach, however you'll have something workingevery step of the way, so when you take a bad step, you can measureperformance impact, and readjust. Plus if you run out of time, you havesomething useful -- not yet another half finished project, which quicklywill rot away.

Regarding generating the spir-v -> scalar llvm, then do whole functionvectorization, I don't think it's a bad idea per se. If was I writingllvmpipe from scratch today I'd do something like that. Especiallybecause (scalar) LLVM IR is so pervasive in the graphics ecosistem anyway.

It was only after I had tgsi -> llvm ir all done that I stumbled intohttp://compilers.cs.uni-saarland.de/projects/wfv/ .

I think the important thing here is that, once you've vectorized theshader, and you converted your "texture_sample" to"texture_sample.vector8", and your "output_merger" intrinsics to"output_merger.vector8", or you log2/exp2, you then slot the fine tunedllvmpipe code for texture sampling and blending and math, as that's wereyour bottle necks tend to be. Because if you plan to write all texturesampling from scratch then you need a time/clone machine to completethis in a summer; and if just use LLVM's / standard C runtime'ssqrt/log2/exp2/sin/cos then it would be dead slow.



Anyway, I hope this helps.  Best of luck.

Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

Reply via email to