On 13.02.2017 17:54, Jacob Lifshay wrote:
the algorithm i was going to use would get the union of the sets of live
variables at the barriers (union over barriers), create an array of
structs that holds them all, then for each barrier, insert the code to
store all live variables, then end the for loop over tid_in_workgroup,
then run the memory barrier, then start another for loop over
tid_in_workgroup, then load all live variables.
Okay, sounds reasonable in theory.
There are some issues, like: how do you actually determine live
variables? If you're working off TGSI like llvmpipe does today, you'd
need to write your own analysis for that, but in a structured control
flow graph like TGSI has, that shouldn't be too difficult.
I'd still recommend you to at least seriously read through the LLVM
coroutine stuff.
Cheers,
Nicolai
Jacob Lifshay
On Feb 13, 2017 08:45, "Nicolai Hähnle" <nhaeh...@gmail.com
<mailto:nhaeh...@gmail.com>> wrote:
[ re-adding mesa-dev on the assumption that it got dropped by accident ]
On 13.02.2017 17:27, Jacob Lifshay wrote:
I would start a thread for each cpu, then have each
thread run the
compute shader a number of times instead of having a
thread per
shader
invocation.
This will not work.
Please, read again what the barrier() instruction does: When the
barrier() call is reached, _all_ threads within the
workgroup are
supposed to be run until they reach that barrier() call.
to clarify, I had meant that each os thread would run the
sections of
the shader between the barriers for all the shaders in a work group,
then, when it finished the work group, it would go to the next work
group assigned to the os thread.
so, if our shader is:
a = b + tid;
barrier();
d = e + f;
and our simd width is 4, our work-group size is 128, and we have
16 os
threads, then it will run for each os thread:
for(workgroup = os_thread_index; workgroup < workgroup_count;
workgroup++)
{
for(tid_in_workgroup = 0; tid_in_workgroup < 128;
tid_in_workgroup += 4)
{
ivec4 tid = ivec4(0, 1, 2, 3) + ivec4(tid_in_workgroup +
workgroup * 128);
a[tid_in_workgroup / 4] = ivec_add(b[tid_in_workgroup /
4], tid);
}
memory_fence(); // if needed
for(tid_in_workgroup = 0; tid_in_workgroup < 128;
tid_in_workgroup += 4)
{
d[tid_in_workgroup / 4] = vec_add(e[tid_in_workgroup / 4],
f[tid_in_workgroup / 4]);
}
}
// after this, we run the next rendering or compute job
Okay good, that's the right concept.
Actually doing that is not at all straightforward though: consider
that the barrier() might occur inside a loop in the shader.
So if you implemented that within the framework of llvmpipe, you'd
make a lot of people very happy: it would allow finally adding
compute shader support to llvmpipe. Mind you, that in itself would
already be a pretty decent-sized project for GSoC!
Cheers,
Nicolai
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev