Am 07.12.2016 um 21:46 schrieb Marek Olšák: > On Wed, Dec 7, 2016 at 6:00 PM, Roland Scheidegger <srol...@vmware.com> wrote: >> Am 07.12.2016 um 17:26 schrieb Marek Olšák: >>> Optimizing the CSO cache isn't exactly on the top of my list, so I >>> can't really do that right now. >>> >>> I think that varying the LOD bias is starting to be common. It's used >>> for smooth LOD transitions when loading textures during rendering. >>> Games with lots of content typically do that. This particular Batman >>> game uses UE3. >> The question of course is if they do it via sampler state or via shader >> lod bias. > > The sampler state. I saw those states. lod_bias was the only changing > variable. I meant in general, not for this particular app.
> >> I suppose that when these objects were designed noone thought it would >> be useful to really create sampler state with lots of different bias >> values. d3d10 of course would have the same problem (and it has limits >> on how many such objects you can create, 4096 per context) but the >> problem shifted to the app since it would have to create the objects >> explicitly - I would suspect the app there would either quantize the >> value itself, or use shader lod bias instead. > > The shader lod bias isn't a better solution though. Any LOD bias is a > modifier of the varying LOD value. For texture streaming, you want to > clamp the LOD, not shift it, thus min_lod is better. However, min_lod > is integer on ATI DX9 GPUs (not sure about the API), so DX9 games > can't use it for smooth transitions. That may explain why we are > seeing it with Wine. I guess any DX10+ and GL3+ games do use min_lod > instead of lod_bias, which means we probably get a lot of sampler > states there too. Oh I missed this was dx9. DX9 otherwise only has D3DSAMP_MAXMIPLEVEL which is integer, equivalent to gl's base level (restricting lod range directly is impossible, albeit I suppose you might translate that MAXMIPLEVEL to min_lod without texture views). > > We could reduce the size of pipe_sampler_state a little. AMD GCN can > represent only 14 bits of lod_bias and 12 bits of min_lod and max_lod. So min_lod and max_lod only has 6 fractional bits whereas lod_bias has 8? In any case even 12 bits is quite a lot of objects you could possibly have... I'm not sure though if using some fixed-point integer in pipe_sampler_state really makes a lot of sense. Roland > > Marek > >> In any case in the end you still have to program the hw differently for >> all these different bias values, albeit quantized to whatever the hw >> supports. >> >> Roland >> >> >> >>> >>> I used to play Borderlands 2 (UE3), it had smooth LOD transitions too, >>> but it never hung. >>> >>> Does this patch have your Rb? >>> >>> Thanks, >>> Marek >>> >>> On Wed, Dec 7, 2016 at 11:16 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: >>>> On 07.12.2016 08:50, Michel Dänzer wrote: >>>>> >>>>> On 06/12/16 10:24 PM, Marek Olšák wrote: >>>>>> >>>>>> On Mon, Dec 5, 2016 at 10:05 AM, Michel Dänzer <mic...@daenzer.net> >>>>>> wrote: >>>>>>> >>>>>>> On 03/12/16 05:38 AM, Marek Olšák wrote: >>>>>>>> >>>>>>>> From: Marek Olšák <marek.ol...@amd.com> >>>>>>>> >>>>>>>> This fixes random radeonsi GPU hangs in Batman Arkham: Origins (Wine) >>>>>>>> and >>>>>>>> probably many other games too. >>>>>>>> >>>>>>>> cso_cache deletes sampler states when the cache size is too big and >>>>>>>> doesn't >>>>>>>> check which sampler states are bound, causing use-after-free in >>>>>>>> drivers. >>>>>>>> Because of that, radeonsi uploaded garbage sampler states and the >>>>>>>> hardware >>>>>>>> went bananas. Other drivers may have experienced similar issues. >>>>>>>> >>>>>>>> Cc: 13.0 12.0 <mesa-sta...@lists.freedesktop.org> >>>>>>>> --- >>>>>>>> src/gallium/auxiliary/cso_cache/cso_cache.c | 4 +++- >>>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/src/gallium/auxiliary/cso_cache/cso_cache.c >>>>>>>> b/src/gallium/auxiliary/cso_cache/cso_cache.c >>>>>>>> index b240c93..1f3be4b 100644 >>>>>>>> --- a/src/gallium/auxiliary/cso_cache/cso_cache.c >>>>>>>> +++ b/src/gallium/auxiliary/cso_cache/cso_cache.c >>>>>>>> @@ -181,21 +181,23 @@ static inline void sanitize_cb(struct cso_hash >>>>>>>> *hash, enum cso_cache_type type, >>>>>>>> --to_remove; >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> struct cso_hash_iter >>>>>>>> cso_insert_state(struct cso_cache *sc, >>>>>>>> unsigned hash_key, enum cso_cache_type type, >>>>>>>> void *state) >>>>>>>> { >>>>>>>> struct cso_hash *hash = _cso_hash_for_type(sc, type); >>>>>>>> - sanitize_hash(sc, hash, type, sc->max_size); >>>>>>>> + >>>>>>>> + if (type != CSO_SAMPLER) >>>>>>>> + sanitize_hash(sc, hash, type, sc->max_size); >>>>>>>> >>>>>>>> return cso_hash_insert(hash, hash_key, state); >>>>>>>> } >>>>>>>> >>>>>>>> struct cso_hash_iter >>>>>>>> cso_find_state(struct cso_cache *sc, >>>>>>>> unsigned hash_key, enum cso_cache_type type) >>>>>>>> { >>>>>>>> struct cso_hash *hash = _cso_hash_for_type(sc, type); >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> If I understand correctly, this means that the maximum cache size >>>>>>> effectively isn't enforced for sampler states? Wouldn't it be better >>>>>>> instead to add code which prevents currently bound states from being >>>>>>> deleted from here? >>>>>> >>>>>> >>>>>> Not in this patch. Maybe later and if CPU profiling results show that >>>>>> it matters. >>>>>> >>>>>> Currently, if the cache size is exceeded, the GPU usually hangs. If >>>>>> pruning the cache was important, most apps would hang. >>>>>> >>>>>> Thus, I'd like to push this as-is. >>>>> >>>>> >>>>> Fair enough, but this will require some follow-up work: >>>>> >>>>> Why are sampler states singled out? The same problem can at least >>>>> theoretically happen with other CSOs as well, right? >>>> >>>> >>>> The other CSOs only have single bind points. So with other CSOs, what >>>> happens is that: >>>> >>>> 1. State tracker wants to bind some state, creates the CSO. >>>> 2. cso_cache gets cleaned out; this might delete the currently bound CSO. >>>> 3. Currently bound CSO gets immediately overwritten with the new state >>>> anyway. >>>> >>>> Samplers are different because there are many bind points within a single >>>> context. >>>> >>>> You're right though that the sounder approach would be to check all CSOs >>>> against current bind points during cache cleanup, just in case a driver >>>> does >>>> something with the old state on unbind. >>>> >>>> >>>>> The maximum cache size needs to be addressed, either something like >>>>> above or by just removing the maximum cache size. >>>>> >>>>> >>>>> BTW, if varying the LOD bias is common, maybe the LOD bias value should >>>>> be tracked separately from the remaining sampler state? >>>> >>>> >>>> It's not common at all, actually. Some apps are just weird. >>>> >>>> >>>>> P.S. The bug has been there forever, so I don't see the justification >>>>> for bypassing the normal stable branch process. >>>> >>>> >>>> And we've had mysterious hang bug reports forever as well. I wouldn't be >>>> surprised if a large number or even all of them could be traced back to >>>> this >>>> issue. >>>> >>>> Nicolai >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DgIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=J1-2JNe113hsUFqegKx05IaoxAonJSJznguiRxatxzQ&s=b0oZ-g2JwG6fc5W6mFVfw7x8H1GGAwkw_pKI0nQC1CY&e= >>> >> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev