I know this is an old thread. I completely missed it the first time, but recently rediscovered after reading http://www.phoronix.com/scan.php?page=news_item&px=Vulkan-CPU-Repository , and perhaps it's not too late for a couple comments FWIW.

On 13/02/17 02:17, Jacob Lifshay wrote:
forgot to add mesa-dev when I sent.
---------- Forwarded message ----------
From: "Jacob Lifshay" <programmerj...@gmail.com <mailto:programmerj...@gmail.com>>
Date: Feb 12, 2017 6:16 PM
Subject: Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc
To: "Dave Airlie" <airl...@gmail.com <mailto:airl...@gmail.com>>
Cc:



On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com <mailto:airl...@gmail.com>> wrote:

     > I'm assuming that control barriers in Vulkan are identical to
    barriers
     > across a work-group in opencl. I was going to have a work-group
    be a single
     > OS thread, with the different work-items mapped to SIMD lanes. If
    we need to
     > have additional scheduling, I have written a javascript compiler that
     > supports generator functions, so I mostly know how to write a
    llvm pass to
     > implement that. I was planning on writing the shader compiler
    using llvm,
     > using the whole-function-vectorization pass I will write, and
    using the
     > pre-existing spir-v to llvm translation layer. I would also write
    some llvm
     > passes to translate from texture reads and stuff to basic vector ops.

    Well the problem is number of work-groups that gets launched could be
    quite high, and this can cause a large overhead in number of host
    threads
    that have to be launched. There was some discussion on this in mesa-dev
    archives back when I added softpipe compute shaders.


I would start a thread for each cpu, then have each thread run the compute shader a number of times instead of having a thread per shader invocation.

At least for llvmpipe, last time I looked into this, using OS green threads seemed a simple non-intrusive method of dealing with this --
https://lists.freedesktop.org/archives/mesa-dev/2016-April/114790.html
-- but it sounds like LLVM coroutines can handle this more effectively.



     > I have a prototype rasterizer, however I haven't implemented
    binning for
     > triangles yet or implemented interpolation. currently, it can handle
     > triangles in 3D homogeneous and calculate edge equations.
     > https://github.com/programmerjake/tiled-renderer
    <https://github.com/programmerjake/tiled-renderer>
     > A previous 3d renderer that doesn't implement any vectorization
    and has
     > opengl 1.x level functionality:
     >
    https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
    <https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>

    Well I think we already have a completely fine rasterizer and binning
    and whatever
    else in the llvmpipe code base. I'd much rather any Mesa based
    project doesn't
    throw all of that away, there is no reason the same swrast backend
    couldn't
    be abstracted to be used for both GL and Vulkan and introducing another
    just because it's interesting isn't a great fit for long term project
    maintenance..

    If there are improvements to llvmpipe that need to be made, then that
    is something
    to possibly consider, but I'm not sure why a swrast vulkan needs a
    from scratch
    raster implemented. For a project that is so large in scope, I'd think
    reusing that code
    would be of some use. Since most of the fun stuff is all the texture
    sampling etc.


I actually think implementing the rasterization algorithm is the best part. I wanted the rasterization algorithm to be included in the shaders, eg. triangle setup and binning would be tacked on to the end of the vertex shader and parameter interpolation and early z tests would be tacked on to the beginning of the fragment shader and blending on to the end. That way, llvm could do more specialization and instruction scheduling than is possible in llvmpipe now.

Parameter interpolation, early z test, and blending *is* tacked to llmvpipe's fragment shaders.


I don't see how to effectively tack triangle setup into the vertex shader: vertex shader applies to vertices, where as triangle setup and bining applies to primitives. Usually, each vertex gets transformed only once with llvmpipe, no matter how many triangles refer that vertex. The only way to tack triangle setup into vertex shading would be if you processed vertices a primitive at a time. Of course one could put an if-statement to skip reprocessing a vertex that already was processed, but then you have race conditions, and no benefit of inlining.


And I'm afraid that tacking rasterization too is one those things that sound great on paper, quite bad in practice. And I speak from experience: in fact llvmpipe had the last step of rasterization bolted on the fragment shaders for some time. But we took it out because it was _slower_.

The issue is that if you bolt on to the shader body, you either:

- inline in the shader body code for the maxmimum number of planes that (which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), and waste cpu cicles going through all of those tests, even when most of the time many of those tests aren't needed

- or you generate if/for blocks for each place, so you only do the needed tests, but then you have branch prediction issues...

Whereas if you keep rasterization _outside_ the shader you can have specialized functions to do the rasterization based on the primitive itself: (is the triangle fully inside the scissor, you need 3 planes, if the stamp is fully inside the triangle you need zero). Essentially you can "compose" by coupling two functions calls: you call a rasterization function that's especiallized for the primitive, then a shading function that's specialized for the state (but not depends on the primitive).

It makes sense: rasterization needs to be specialized for the primitive, not the graphics state; where as the shader needs to be specialized for the state.



And this is just one of those non-intuitive things that's not obvious until one actually does a lot of profiling, a lot of experimentation. And trust me, lot of time was spent fine tuning this for llvmpipe (not be me -- most of rasterization was done by Keith Whitwell.) And by throwing llvmpipe out of the window and starting a new software rendering from scratch you'd be just subscribing to do it all over again.

Whereas if instead of starting from scratch, you take llvmpipe, and you rewrite/replace one component at a time, you can reach exactly the same destination you want to reach, however you'll have something working every step of the way, so when you take a bad step, you can measure performance impact, and readjust. Plus if you run out of time, you have something useful -- not yet another half finished project, which quickly will rot away.





Regarding generating the spir-v -> scalar llvm, then do whole function vectorization, I don't think it's a bad idea per se. If was I writing llvmpipe from scratch today I'd do something like that. Especially because (scalar) LLVM IR is so pervasive in the graphics ecosistem anyway.

It was only after I had tgsi -> llvm ir all done that I stumbled into http://compilers.cs.uni-saarland.de/projects/wfv/ .

I think the important thing here is that, once you've vectorized the shader, and you converted your "texture_sample" to "texture_sample.vector8", and your "output_merger" intrinsics to "output_merger.vector8", or you log2/exp2, you then slot the fine tuned llvmpipe code for texture sampling and blending and math, as that's were your bottle necks tend to be. Because if you plan to write all texture sampling from scratch then you need a time/clone machine to complete this in a summer; and if just use LLVM's / standard C runtime's sqrt/log2/exp2/sin/cos then it would be dead slow.


Anyway, I hope this helps.  Best of luck.

Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to