On 13.02.2017 03:17, Jacob Lifshay wrote:
On Feb 12, 2017 5:34 PM, "Dave Airlie" <airl...@gmail.com
<mailto:airl...@gmail.com>> wrote:
> I'm assuming that control barriers in Vulkan are identical to barriers
> across a work-group in opencl. I was going to have a work-group be
a single
> OS thread, with the different work-items mapped to SIMD lanes. If
we need to
> have additional scheduling, I have written a javascript compiler that
> supports generator functions, so I mostly know how to write a llvm
pass to
> implement that. I was planning on writing the shader compiler
using llvm,
> using the whole-function-vectorization pass I will write, and
using the
> pre-existing spir-v to llvm translation layer. I would also write
some llvm
> passes to translate from texture reads and stuff to basic vector ops.
Well the problem is number of work-groups that gets launched could be
quite high, and this can cause a large overhead in number of host
threads
that have to be launched. There was some discussion on this in mesa-dev
archives back when I added softpipe compute shaders.
I would start a thread for each cpu, then have each thread run the
compute shader a number of times instead of having a thread per shader
invocation.
This will not work.
Please, read again what the barrier() instruction does: When the
barrier() call is reached, _all_ threads within the workgroup are
supposed to be run until they reach that barrier() call.
So you need a way of suspending and resuming shader threads when they
reach the barrier() call.
The brute-force way of doing this would be to have one OS thread per
shader thread (or per N shader threads, where N is a fixed number
corresponding to SIMD lanes), but that gives you a giant number of OS
threads to contend with.
The alternative is to do "threads" in user space, and there are a bunch
of options for that. LLVM coroutines are worth checking out, since I
think they're more or less designed for that kind of thing. Another
option is user space stack switching, or perhaps something entirely
different.
Nicolai
> I have a prototype rasterizer, however I haven't implemented
binning for
> triangles yet or implemented interpolation. currently, it can handle
> triangles in 3D homogeneous and calculate edge equations.
> https://github.com/programmerjake/tiled-renderer
<https://github.com/programmerjake/tiled-renderer>
> A previous 3d renderer that doesn't implement any vectorization
and has
> opengl 1.x level functionality:
> https://github.com/programmerjake/lib3d/blob/master/softrender.cpp
<https://github.com/programmerjake/lib3d/blob/master/softrender.cpp>
Well I think we already have a completely fine rasterizer and binning
and whatever
else in the llvmpipe code base. I'd much rather any Mesa based
project doesn't
throw all of that away, there is no reason the same swrast backend
couldn't
be abstracted to be used for both GL and Vulkan and introducing another
just because it's interesting isn't a great fit for long term project
maintenance..
If there are improvements to llvmpipe that need to be made, then that
is something
to possibly consider, but I'm not sure why a swrast vulkan needs a
from scratch
raster implemented. For a project that is so large in scope, I'd think
reusing that code
would be of some use. Since most of the fun stuff is all the texture
sampling etc.
I actually think implementing the rasterization algorithm is the best
part. I wanted the rasterization algorithm to be included in the
shaders, eg. triangle setup and binning would be tacked on to the end of
the vertex shader and parameter interpolation and early z tests would be
tacked on to the beginning of the fragment shader and blending on to the
end. That way, llvm could do more specialization and instruction
scheduling than is possible in llvmpipe now.
so the tile rendering function would essentially be:
for(i = 0; i < triangle_count; i+= vector_width)
jit_functions[i](tile_x, tile_y, &triangle_setup_results[i]);
as opposed to the current llvmpipe code where there is a large amount of
fixed code that isn't optimized with the shaders.
> The scope that I intended to complete is the bare minimum to be vulkan
> conformant (i.e. no tessellation and no geometry shaders), so
implementing a
> loadable ICD for linux and windows that implements a single queue,
vertex,
> fragment, and compute shaders, implementing events, semaphores,
and fences,
> implementing images with the minimum requirements, supporting a
f32 depth
> buffer or a f24 with 8bit stencil, and supporting a
yet-to-be-determined
> compressed format. For the image optimal layouts, I will probably
use the
> same chunked layout I use in
>
https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59
<https://github.com/programmerjake/tiled-renderer/blob/master2/image.h#L59>
,
> where I have a linear array of chunks where each chunk has a
linear array of
> texels. If you think that's too big, we could leave out all of the
image
> formats except the two depth-stencil formats, the 8-bit and 32-bit
integer
> and 32-bit float formats.
>
Seems like a quite large scope, possibly a bit big for a GSoC though,
esp one that
intends to not use any existing Mesa code.
most of the vulkan functions have a simple implementation when we don't
need to worry about building stuff for a gpu and synchronization
(because we have only one queue), and llvm implements most of the rest
of the needed functionality. If we leave out most of the image formats,
that would probably cut the amount of code by a third.
Dave.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev