On Sun, Apr 8, 2018 at 5:20 PM, Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote: >>>>>>>>> + >>>>>>>>> + /** The mode of the underlying variable */ >>>>>>>>> + nir_variable_mode mode; >>>>>>>> >>>>>>>> In fact, it seems like deref->mode is unused outside of nir_print and >>>>>>>> nir_validate.. for logical addressing we can get the mode from the >>>>>>>> deref_var->var at the start of the chain, and deref->mode has no >>>>>>>> meaning for physical addressing (where the mode comes from the >>>>>>>> pointer). >>>>>>>> >>>>>>>> So maybe just drop deref->mode? >>>>>>> >>>>>>> Isn't it still useful with logical addressing in case a var is not >>>>>>> immediately available? (think VK_KHR_variable_pointers) >>>>>> >>>>>> not sure, maybe this should just also use fat-pointers like physical >>>>>> addressing does?? >>>>>> >>>>>>> Also I could see this being useful in physical addressing too to avoid >>>>>>> all passes working with derefs needing to do the constant folding? >>>>>> >>>>>> The problem is that you don't necessarily know the type at compile >>>>>> time (and in the case where you do, you need to do constant folding to >>>>>> figure it out) >>>>> >>>>> So I have two considerations here >>>>> >>>>> 1) for vulkan you always know the mode, even when you don't know the var. >>>>> 2) In CL the mode can still get annotated in the source program (CL C >>>>> non-generic pointers) in cases in which we cannot reasonably figure it >>>>> out with just constant folding. In those cases the mode is extra >>>>> information that you really lose. >>>> >>>> so, even in cl 1.x, you could do things like 'somefxn(foo ? global_ptr >>>> : local_ptr)'.. depending on how much we inline all the things, that >>>> might not get CF'd away. > > How does this even work btw? somefxn has a definition, and the > definition specifies a mode for the argument right? (which is > implicitly __private if the app does not specify anything?)
iirc, the cl spec has an example something along these lines.. it doesn't require *physical* storage for anything where you don't know what the ptr type is, however.. so fat ptrs in ssa space works out >>> >>> But something like >>> __constant int *ptr_value = ...; >>> store ptr in complex data structure. >>> __constant int* ptr2 = load from complex data structure. >>> >>> Without explicitly annotating ptr2 it is unlikely that constant >>> folding would find that ptr2 is pointing to __constant address space. >>> Hence removing the modes loses valuable information that you cannot >>> get back by constant folding. However, if you have a pointer with >>> unknown mode, we could have a special mode (or mode_all?) and you can >>> use the uvec2 representation in that case? >> >> hmm, I'm not really getting how deref->mode could magically have >> information that fatptr.y doesn't have.. if the mode is known, vtn >> could stash it in fatptr.y and everyone is happy? If vtn doesn't know >> this, then I don't see how deref->mode helps.. > > You mean insert it into the fatptr every time deref_cast is called? > > Wouldn't that blow up the IR size significantly for very little benefit? in an easy to clean up way, so meh? > >> >>>> >>>> I think I'm leaning towards using fat ptrs for the vk case, since I >>>> guess that is a case where you could always expect >>>> nir_src_as_const_value() to work, to get the variable mode. If for no >>>> other reason than I guess these deref's, if the var is not known, >>>> start w/ deref_cast, and it would be ugly for deref_cast to have to >>>> work differently for compute vs vk. But maybe Jason already has some >>>> thoughts about it? >>> >>> I'd like to avoid fat pointers alltogether on AMD since we would not >>> use it even for CL. a generic pointer is just a uint64_t for us, with >>> no bitfield in there for the address space. >>> >>> I think we may need to think a bit more about representation however, >>> as e.g. for AMD a pointer is typically 64-bits (but we can do e.g. >>> 32-bits for known workgroup pointers), the current deref instructions >>> return 32-bit, and you want something like a uvec2 as pointer >>> representation? >> >> afaiu, newer AMD (and NV) hw can remap shared/private into a single >> global address space.. But I guess that is an easy subset of the >> harder case where drivers need to use different instructions.. so a >> pretty simple lowering pass run before lower_io could remap things >> that use fatptrs into something that ignores fatptr.y. Then opt >> passes make fatptr.y go away. So both AMD and hw that doesn't have a >> flat address space are happy. > > But then you run into other issues, like how are you going to stuff a > 64-bit fatptr.x + a ?-bit fatptr.y into a 64-bit value for Physical64 > addressing? Also this means we have to track to the sources back to > the cast/var any time we want to do anything at all with any deref > which seems less efficient to me than just stuffing the deref in > there. so fat ptrs only have to exist in ssa space, not be stored to something with a physically defined size.. As far as tracking things to the head of the chain of deref instructions, that is a matter of a simple helper or two. Not like the chain of deref's is going to be 1000's of instructions.. > Also, what would the something which ignores fatptr.y be? I'd assume > that would be the normal deref based stuff, but requiring fatptr > contradicts that? if you have a flat address space, maybe a pass (or option for lower_io) to just convert everything to load/store_global (since essentially what these GPUs are doing is remapping shared/private into the global address space) BR, -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev