Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

Jose Fonseca Sun, 11 Jun 2017 01:42:34 -0700

On 11/06/17 07:59, Jacob Lifshay wrote:

On Sat, Jun 10, 2017 at 3:25 PM Jose Fonseca <jfons...@vmware.com<mailto:jfons...@vmware.com>> wrote:


    I don't see how to effectively tack triangle setup into the vertex
    shader: vertex shader applies to vertices, where as triangle setup and
    bining applies to primitives.  Usually, each vertex gets transformed
    only once with llvmpipe, no matter how many triangles refer that vertex.
       The only way to tack triangle setup into vertex shading would be if
    you processed vertices a primitive at a time.  Of course one could put
    an if-statement to skip reprocessing a vertex that already was
    processed, but then you have race conditions, and no benefit of
    inlining.

I was mostly thinking of non-indexed vertices.

    And I'm afraid that tacking rasterization too is one those things that
    sound great on paper, quite bad in practice.  And I speak from
    experience: in fact llvmpipe had the last step of rasterization bolted
    on the fragment shaders for some time.  But we took it out because it
    was _slower_.

    The issue is that if you bolt on to the shader body, you either:

    - inline in the shader body code for the maxmimum number of planes that
    (which are 7, 3 sides of triangle, plus 4 sides of a scissor rect), and
    waste cpu cicles going through all of those tests, even when most of the
    time many of those tests aren't needed

    - or you generate if/for blocks for each place, so you only do the
    needed tests, but then you have branch prediction issues...

    Whereas if you keep rasterization _outside_ the shader you can have
    specialized functions to do the rasterization based on the primitive
    itself: (is the triangle fully inside the scissor, you need 3 planes, if
    the stamp is fully inside the triangle you need zero).  Essentially you
    can "compose" by coupling two functions calls: you call a rasterization
    function that's especiallized for the primitive, then a shading function
    that's specialized for the state (but not depends on the primitive).

    It makes sense: rasterization needs to be specialized for the primitive,
    not the graphics state; where as the shader needs to be specialized for
    the state.

I am planning on generating a function for each primitive type and statecombination, or I can convert all primitives into triangles and justhave a function for each state. The state includes stuff like if aparticular clipping/scissor equation needs to be checked. I did it thatway in my proof-of-concept code by using c++ templates to do the codeduplication:https://github.com/programmerjake/tiled-renderer/blob/47e09f5d711803b8e899c3669fbeae3e62c9e32c/main.cpp#L366

I'm not sure there will be enough benefits of iniline to compensate thetime spent on compiling 2**7 variants of each shader to cope with allpossible incoming triangles..

    And this is just one of those non-intuitive things that's not obvious
    until one actually does a lot of profiling, a lot of experimentation.
    And trust me, lot of time was spent fine tuning this for llvmpipe (not
    be me -- most of rasterization was done by Keith Whitwell.)  And by
    throwing llvmpipe out of the window and starting a new software
    rendering from scratch you'd be just subscribing to do it all over
    again.

    Whereas if instead of starting from scratch, you take llvmpipe, and you
    rewrite/replace one component at a time, you can reach exactly the same
    destination you want to reach, however you'll have something working
    every step of the way, so when you take a bad step, you can measure
    performance impact, and readjust.  Plus if you run out of time, you have
    something useful -- not yet another half finished project, which quickly
    will rot away.
In the case that the project is not finished this summer, I'm stillplanning on working on it, just at a reduced rate. If all else fails, wewill at least have a up-to-date spir-v to llvm converter that handlesthe glsl spir-v extensions.
    Regarding generating the spir-v -> scalar llvm, then do whole function
    vectorization, I don't think it's a bad idea per se.  If was I writing
    llvmpipe from scratch today I'd do something like that.  Especially
    because (scalar) LLVM IR is so pervasive in the graphics ecosistem
    anyway.

    It was only after I had tgsi -> llvm ir all done that I stumbled into
    http://compilers.cs.uni-saarland.de/projects/wfv/ .

    I think the important thing here is that, once you've vectorized the
    shader, and you converted your "texture_sample" to
    "texture_sample.vector8", and your "output_merger" intrinsics to
    "output_merger.vector8", or you log2/exp2, you then slot the fine tuned
    llvmpipe code for texture sampling and blending and math, as that's were
    your bottle necks tend to be.  Because if you plan to write all texture
    sampling from scratch then you need a time/clone machine to complete
    this in a summer; and if just use LLVM's / standard C runtime's
    sqrt/log2/exp2/sin/cos then it would be dead slow.
I am planning on using c++ templates to help with a lot of the texturesampler code generation -- clang can convert it to llvm ir and then Ican inline it into the appropriate places. I think that all of thenon-compressed image formats should be pretty easy to handle that way,as they are all pretty similar (bits packed into a long word or membersof a struct). I can implement interpolation on top of the functions toload and unpack the image elements from memory. I'd estimate that,excluding the compressed texture formats, I'd need less than 10k linesand maybe a week or two to implement it all. (Glad I don't have toimplement that in C.)

But how will 8/16-bit integer SIMD operations be generated in thisprocess? Because if you stick with floating point then you'll nevercatch up with llvmpipe for texturing. And this is the bottleneck formost practical applications.


I am planning on compiling fdlibm with clang into

llvm ir, then running my vectorization algorithm on all the functions.LLVM has a spot where you can tell it that you have optimized vectorizedmath intrinsics, I could add them there, or implement another loweringpass to convert the intrinsics to function calls, which can then beinlined. Hopefully, that will save most of the work needed to implementvectorized math functions.

I don't see how a whole function vectorization function will transformsomething like this


  http://www.netlib.org/fdlibm/e_log.c

into something like this


https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/gallivm/lp_bld_arit.c#n3451

Vectorization is only a (actually small) part of this: one doesn't needor want to deal with subnormals, and graphics APIs have higher tolerancein precision, and since this is a software renderer you don't want to bemore precise than you have to. Furthermore the fdlibm code don't seemamenable to vectorization -- they are full of control flow if/for loops.Not to mention: they used _double_ precision...

I have to say I'm missing the plot here. Is the objective to make afast software renderer, or just another software renderer? Because insome cases you justify some several decision in the pursuit ofperformance, whereas in others you make design decisions in spite ofpoor performance.

I can certainly relate to the wish of doing cool stuff (and IMO softwarerenderers is one of the most fun stuff to work on in graphics), and Idon't expect you to take grunt work (that's why employment exists.) ButI just find it a pity that a better overlap between cool/fun and usefulwasn't found. We already have 4 software renderer in Mesa, two of themfast. And to me it seems we're putting ourselves in a lose-losesituation: if you're not successful (ie this new software rendererdoesn't reach critical mass and eventually dies down when you stopworking), you just wasted your time and we missed an opportunity toleverage your skills. If you are successfull we have yet anothersoftware renderer in Mesa competing for attention...

On the flip side, there are some interesting ideas you propose I'mcurious to see how they pan out.


Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] software implementation of vulkan for gsoc/evoc

Reply via email to