On 28/09/15 14:51, Marek Olšák wrote:
On Mon, Sep 28, 2015 at 2:55 PM, Jose Fonseca <jfons...@vmware.com> wrote:
On 27/09/15 19:14, Marek Olšák wrote:

Hi,

For some reason, st/mesa assumes that shaders can't be shared by
multiple contexts and therefore has a context pointer in shader keys.
I'd like to get rid of this dependency, because there is no reason why
shaders would need to be tied to a context. In fact, shaders don't
even need a device, they just need a compiler.



This is becoming a bigger issue with latest games that might prefer
compiling shaders in another thread using a different GL context.

As far as I know, shaders should be context-shareable on all hardware
drivers.


The thing to keep in mind is that, although in theory gallium shaders could
be compiled once and reused by many pipe contexts, in practice many drivers
need to produce shader variants that mix in state from other context
specific state objects, at draw time.

llvmpipe, svga pipe driver are examples of such drivers that I know. Maybe
amd/nvidia gallium drivers are immune from that problem, but my
understanding was that very few hardware has internal state objects that
line up perfectly with gallium state objects, hence that this problem was
widespread.


And because until now shaders were per-context, this means that pipe drivers
can attach the variants to the shader object without any worries about
multi-threading, as it's guaranteed that no other thread is operating on
them.

I'm well aware of that issue.

If you're aware but didn't think worth mention, then you clearly are underestimating the issue.

Multi-threading is never easy. I think it's worth for you to open your mind and think things through more thoroughly.

> You can use a mutex to prevent 2 threads
from adding 2 shader variants simultaneously into a pipe shader. This
should prevent corrupting the linked list of variants. Looking up the
required shader variant based on states doesn't need locks if you add
an atomic counter containing how many shaders are in the list,
preventing the lookup function from having to read the ending pointer
of the list where new shaders can be added by other threads.

Drivers also shouldn't attach state pointers to shaders, but instead
they should add their copies there. All in all, this is quite easily
fixable and therefore not a big issue. It certainly doesn't prevent
shaders from becoming screen objects.



So, if you want to make pipe shaders global, you need to ensure that all
shaders objects manipulations are thread-safe.  In particular, if you still
want to attach variants to the shaders, you need to ensure access to mutable
members are protected with mutex.  Which means you might be forced to lock
mutexes on hot-paths, severely impacting performance (single- and multi-
thread apps alike).

The mutex would only have to be locked when a new shader variant is
added, which is a time-consuming operation by itself (compilation
etc.), so there is no performance concern from adding a mutex.

> Other cases don't need the lock.

I don't see how that can work.

You need to lock, not only when adding variants, but _all_ accesses, including traversing the variants to search for the right one.

That is, _every_ time you need to search for a variant in the screen shader, you'll need to lock the shader. If you don't, a different thread might write to the variant list/hashtable exactly at the same time you're traversing it, invalidating the pointers you're using to iterate over the variants.

In other words, you'll be locking mutexes on pretty much every draw call...

Personally I think that's a poor design. I don't know how much is the mutex lock overhead is in practice nowadays, but doing it per draw seems bad, because it could all be avoided with a better design.



One could avoid the threading primitives by attaching the variants to the
context themselves -- but then that's no different from what we do today --
ie, you'd be compiling as often as today.

Like I said above, it's easy to add shader variants to screen-based
shaders with a mutex that has no impact on performance.


I disagree with your option 1). But if you're still determined, by all means, do option 2). I think you'll regret it though: down the road you'll be either struggling with multi-thread correctness or performance.

I don't feel too strongly either way: maybe the benefits of having shaders
shared by many pipe context are indeed worth the trouble.  But let it be no
doubt: that implies making all operations on shader objects thread safe.
And that is not a trivial matter.  It's certainly not as easy as just coping
with a null pipe_context argument.


I see only 2 options out of this:

1) Removing the context pointer from the shader keys. If drivers need
this, they should handle it by themselves. This will simplify st/mesa,
because TCS, TES, GS won't need the variant-tracking machinery.

2) Adding PIPE_CAP_SHAREABLE_SHADERS and if the cap is 1, set the
context pointer to NULL in all keys.

What do you think?

Marek


I think there are other options worth considering:


a) I wonder if wouldn't be better the other way around: if drivers want to
avoid compiling the same shader for many contexts, they should maintain a
shader cache internally that's shared by all contexts.

    As that can be done much more efficient -- there are many ways to
implement a global cache that avoids frequent locking

    And the way to avoid reimplementing the wheel on every driver would be to
have an helper module that would take care of that -- ie, a generial
auxiliary module to maintain the shader cache in a thread safe maner.  That
module could potentially even learn to (de)serializing the cache from disk.

This is too complicated for now and not required for having
screen-based shaders.



b) split pipe shader objects in two parts:

    1) a global object, which would allow the pipe drivers to pre-compile the
TGSI into an internal IR

    2) per-context objects, so that pipe drivers could produce variants of
that IR which takes in account the currently bound state

   And for pipe drivers that can't pre-compile anything, 1) would be nothing
more than a duplication of TGSI tokens.

I assume most drivers just don't want per-context shaders, so adding
an interface for it would be counterproductive.

Maybe I didn't explain. State tracker can't choose between 1) and 2) -- it needs to create both.

That is,
- state tracker would create a screen-shader from TGSI tokens (one for the whole share group) - then create a context-shader from the screen-shader, one for each context that shader is used by (so drivers can attach the variants to the per-context shaders without locking)






c) introduce Metal/DX12/Vulkan like "pipeline state objects", whereby the
state tracker declares upfront how the whole pipeline state objects are
bound together, so the driver can produce the shader variant when that
pipeline state is created, and doesn't need to worry about shader variantes
at draw time.

    This is in fact like a) but moves the burden of creating the variant
cache into the state tracker (as the pipeline state cache) where it can be
shared by all pipe drivers.

This is not applicable to OpenGL, because it doesn't have pipeline
objects, and I mainly optimize for OpenGL here.

One thing doesn't imply the other.

OpenGL doesn't have CSOs neither, but Gallium has CSOs.

So the fact that OpenGL doesn't have pipeline objects has nothing to do with Gallium having them or not.

> Additionally, Gallium
is not a good match for Vulkan and a whole new driver interface for
Vulkan that is very different from Gallium will need to be introduced,
though it's more likely people will want to implement Vulkan functions
directly and skip the interface.

Gallium is a HAL; Gallium CSO don't always match with HW objects; pipeline state objects are a solution for that mismatch problem. Hence there is merit in considering pipeline state objects for Gallium on their own right, _even_ if nobody ever writes a Metal/Vulkan/D3D12 driver for Gallium.




Jose


For some reason, st/mesa assumes that shaders can't be shared by
multiple contexts and therefore has a context pointer in shader keys.

PS: As I write this, I can't help having a feeling of deja-vu. I couldn't
find any discussion from my mail archives, but I think this subject has been
brought before.

I don't think this has been brought up before.


I wonder if we should start maintaining a "design.rst" in src/gallium/docs/
to document why some things were done the way they were.

In this case, I believe per-context shaders were done purely because
of bad decisions and out-of-tree dependencies:
- classic drivers had (mostly) per-context shaders, so gallium has them too
- DX9 and DX10 have per-context shaders

It's a false belief...

Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to