On Wed, Apr 20, 2016 at 12:35 PM, Marek Olšák <[email protected]> wrote: > On Wed, Apr 20, 2016 at 11:14 AM, Oded Gabbay <[email protected]> wrote: >> On Wed, Apr 20, 2016 at 12:04 PM, Michel Dänzer <[email protected]> wrote: >>> On 20.04.2016 17:48, Oded Gabbay wrote: >>>> On Wed, Apr 20, 2016 at 11:28 AM, Michel Dänzer <[email protected]> wrote: >>>>> On 20.04.2016 03:13, Oded Gabbay wrote: >>>>>> On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <[email protected]> wrote: >>>>>>> On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <[email protected]> >>>>>>> wrote: >>>>>>>> On Mon, Apr 18, 2016 at 6:03 PM, >>>>>>>> Ilia Mirkin <[email protected]> wrote: >>>>>>>>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>> Wouldn't it make more sense to handle such issues in transfer_map? >>>>>>>>>>>>> (i.e. create a staging memory area, and decode into it)? This >>>>>>>>>>>>> assumes >>>>>>>>>>>>> that the transfer_map() call has enough information to "do the >>>>>>>>>>>>> right >>>>>>>>>>>>> thing". I don't think it does today, but perhaps it could be >>>>>>>>>>>>> taught? >>>>>>>>>>>> It doesn't have all the info today, that's for sure. I imagine >>>>>>>>>>>> though >>>>>>>>>>>> we can add parameters to it. >>>>>>>>>>>> >>>>>>>>>>>>> That way everything that's in a pipe_resource is in some >>>>>>>>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters >>>>>>>>>>>>> when converting between CPU-read/written and GPU-read/written >>>>>>>>>>>>> data. I >>>>>>>>>>>>> believe this is a better match for what's really happening, too. >>>>>>>>>>>>> What >>>>>>>>>>>>> do you think? >>>>>>>>>>>>> >>>>>>>>>>>>> -ilia >>>>>>>>>>>> >>>>>>>>>>>> Unless I'm missing something, I think, at the end of the day, it >>>>>>>>>>>> will >>>>>>>>>>>> be the same issues as in my solution - per code path per format is >>>>>>>>>>>> a >>>>>>>>>>>> different case. That's because you will still need to "teach" >>>>>>>>>>>> transfer_map, per each transfer per format what to do. So one will >>>>>>>>>>>> need to go and debug every single code path there is in mesa for >>>>>>>>>>>> drawing/copying/reading/textures/etc., like what I did in the last >>>>>>>>>>>> 1.5 >>>>>>>>>>>> months. It's a great learning experience but it won't give anything >>>>>>>>>>>> generic. >>>>>>>>>>>> >>>>>>>>>>>> Again, for example, in st_ReadPixels, I imagine you will need to >>>>>>>>>>>> give >>>>>>>>>>>> "different orders" to transfer_map for the two different scenarios >>>>>>>>>>>> - >>>>>>>>>>>> H/W blit and fallback. So what's the gain here ? >>>>>>>>>>>> >>>>>>>>>>>> If I'm missing something, please tell me. >>>>>>>>>>> >>>>>>>>>>> One of us is... let's figure out which one :) >>>>>>>>>>> >>>>>>>>>>> Here's my proposal: >>>>>>>>>>> >>>>>>>>>>> All data stored inside of resources is stored in a driver-happy >>>>>>>>>>> format. The driver ensures that it's stored in proper endianness, >>>>>>>>>>> etc. >>>>>>>>>>> (Much like it does today wrt proper stride.) >>>>>>>>>>> >>>>>>>>>>> Blitting(/copying) between resources doesn't require any additional >>>>>>>>>>> information, since you have the format(s) of the respective >>>>>>>>>>> resources, >>>>>>>>>>> and it's all inside the driver, so the driver does whatever it needs >>>>>>>>>>> to do to make it all "work". >>>>>>>>>>> >>>>>>>>>>> *Accessing and modifying* resources (directly) from the CPU is what >>>>>>>>>>> becomes tricky. The state tracker may have incorrect expectations of >>>>>>>>>>> the actual backing data. There are a few different ways to resolve >>>>>>>>>>> this. The one I'm proposing is that you only ever return a pointer >>>>>>>>>>> to >>>>>>>>>>> the directly underlying data if it matches the CPU's expectations >>>>>>>>>>> (which will only be the case for byte-oriented array formats like >>>>>>>>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g. >>>>>>>>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go >>>>>>>>>>> through >>>>>>>>>>> a bounce buffer. >>>>>>>>>>> >>>>>>>>>>> At transfer map time, you convert the data from GPU-style to >>>>>>>>>>> CPU-style, and copy back the relevant bits at unmap/flush time. >>>>>>>>>>> >>>>>>>>>>> This presents a nice clean boundary for this stuff. Instead of the >>>>>>>>>>> state tracker trying to guess what the driver will do and feeding it >>>>>>>>>>> endiannesses that it can't possibly guess properly, the tracking >>>>>>>>>>> logic >>>>>>>>>>> is relegated to the driver, and we extend the interfaces to allow >>>>>>>>>>> the >>>>>>>>>>> state tracker to access the data in a proper way. >>>>>>>>>>> >>>>>>>>>>> I believe the advantage of this scheme is that beyond adding format >>>>>>>>>>> parameters to pipe_transfer_map() calls, there will not need to be >>>>>>>>>>> any >>>>>>>>>>> adjustments to the state trackers. >>>>>>>>>>> >>>>>>>>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it >>>>>>>>>>> maps a buffer, it's formatless (at map time), and yet the GPU will >>>>>>>>>>> be >>>>>>>>>>> required to interpret it correctly. We could decree that PIPE_BUFFER >>>>>>>>>>> is just *always* an array of R8_UNORM and thus never needs any type >>>>>>>>>>> of >>>>>>>>>>> swapping. The driver needs to adjust accordingly to deal with >>>>>>>>>>> accesses >>>>>>>>>>> that don't fit that pattern (and where parameters can't be fed to >>>>>>>>>>> the >>>>>>>>>>> GPU to interpret it properly). >>>>>>>>>>> >>>>>>>>>>> I think something like the above will work. And I think it presents >>>>>>>>>>> a >>>>>>>>>>> cleaner barrier than your proposal, because none of the "this GPU >>>>>>>>>>> can >>>>>>>>>>> kinda-sorta understand BE, but not everywhere" details are ever >>>>>>>>>>> exposed to the state tracker. >>>>>>>>>>> >>>>>>>>>>> Thoughts? >>>>>>>>>>> >>>>>>>>>>> -ilia >>>>>>>>>> >>>>>>>>>> Ilia, >>>>>>>>>> >>>>>>>>>> To make the GPU do a conversion during blitting, I need to configure >>>>>>>>>> registers. This is done in a couple of functions in the r600g driver >>>>>>>>>> (r600_translate_texformat, r600_colorformat_endian_swap, >>>>>>>>>> r600_translate_colorformat and r600_translate_colorswap). >>>>>>>>>> >>>>>>>>>> The problem is that transfer_map/unmap don't call directly to those >>>>>>>>>> functions. They call other functions which eventually call those 4 >>>>>>>>>> functions. Among those "other" functions, there are several function >>>>>>>>>> calls which are *not* in the r600g driver. i.e. we go back to generic >>>>>>>>>> util functions. For example: >>>>>>>>>> >>>>>>>>>> #0 r600_translate_colorformat >>>>>>>>>> #1 evergreen_init_color_surface >>>>>>>>>> #2 evergreen_set_framebuffer_state >>>>>>>>>> #3 util_blitter_custom_depth_stencil >>>>>>>>>> #4 r600_blit_decompress_depth >>>>>>>>>> #5 r600_texture_transfer_map >>>>>>>>>> >>>>>>>>>> Am I allowed to now pass information from transfer_map/unmap all the >>>>>>>>>> way down to the 4 functions I mentioned through all these layers as >>>>>>>>>> additional parameters ? I preferred to put it in pipe_resource as >>>>>>>>>> that >>>>>>>>>> information goes all the way down to those functions, but if I can't >>>>>>>>>> use that, then what's an acceptable alternative ? >>>>>>>>>> >>>>>>>>>> This time, I would like to get an agreement *before* I implement it. >>>>>>>>> >>>>>>>>> Probably a good idea. And as issues are investigated, people's >>>>>>>>> opinions on the "correct" way might shift. Let's think about this... >>>>>>>>> >>>>>>>>> So clearly *a* correct way to handle this would be to stop all the >>>>>>>>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g. >>>>>>>>> R5G6B5 but makes no mention of the byte layout in memory for those 16 >>>>>>>>> bits. Really what we have right now is a format and an *implicit* >>>>>>>>> endian ordering, which is the CPU's. But what happens when the CPU and >>>>>>>>> GPU don't agree? >>>>>>>>> >>>>>>>>> There's a path we could take which would be to add an endianness >>>>>>>>> alongside each format (be it by doubling formats, or an explicit >>>>>>>>> second field). This would be a very far-reaching change though, and I >>>>>>>>> doubt you'll want to do it. What we're left with is having a format >>>>>>>>> and an *implicit* endianness. Which means that the consumers of the >>>>>>>>> format need to be able to work out the implicit endianness involved. >>>>>>>>> And the endianness will be GPU endian for regular resources, and CPU >>>>>>>>> endian for "staging" resources. So it's definitely tempting to stick >>>>>>>>> the endian thing into a private field of the resource, like Rob is >>>>>>>>> suggesting - when creating a staging texture in >>>>>>>>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise >>>>>>>>> set it to gpu endian. And I think this is somewhat similar to your >>>>>>>>> former approach. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> -ilia >>>>>>>> >>>>>>>> I don't think I have any other choice but to stick it as a private >>>>>>>> field, because the endian parameter simple can't go through all the >>>>>>>> function calls as an additional parameter. The reason is that >>>>>>>> set_framebuffer_state() function types are called from >>>>>>>> st_invalidate_state, where I don't have any idea about the "correct" >>>>>>>> endianess, so I can't add the endian parameter to that function type. >>>>>>>> >>>>>>>> The only thing that is propagated through all layers is r600_texture. >>>>>>>> I'll try to use that. >>>>>>>> >>>>>>>> Marek, Michel, >>>>>>>> Do you think it is OK to add the endian mark to that private structure >>>>>>>> ? >>>>>>> >>>>>>> Yes, but doesn't util_format_description::is_array provide that info >>>>>>> already? >>>>>> >>>>>> If you mean to say that they are interchangeable, than no. >>>>>> It's true that for array formats we *never* need to do swaps, but for >>>>>> non-array formats there are cases where we need to do it and cases >>>>>> where we don't need to do it. >>>>>> >>>>>> For example, when writing to depth buffer through st_DrawPixels >>>>>> (glDrawPixels) with GL_FLOAT, than the format used is >>>>>> PIPE_FORMAT_Z32_FLOAT. >>>>>> In the make_texture() part, we need to configure endian swap for the >>>>>> staging buffer, but after you return from it and go into >>>>>> st_create_texture_sampler_view(), you want the destination texture to >>>>>> be configured without endian swapping. >>>>> >>>>> So the CB/DB block can't swap bytes when accessing Z32? >>>>> >>>> Not sure what you mean by this sentence. Could you please further >>>> explain your question ? >>> >>> For packed formats / components such as Z32, the working theory is that >>> state tracker / Mesa core / application code accesses them using the CPU >>> byte order. So if the hardware can be programmed to always access them >>> in the CPU byte order, that should theoretically work in all cases. >>> >>> >>> -- >>> Earthling Michel Dänzer | http://www.amd.com >>> Libre software enthusiast | Mesa and X developer >> >> Well, it doesn't work in practice. If that were true, than we wouldn't >> need all this hints. We would do swaps all the time. >> >> And if you think about it, once you use a staging buffer in the >> middle, than you don't want to do two swaps. If you do two swaps (from >> CPU to staging buffer and from staging buffer to GPU), you will return >> to the original layout, which is big-endian in our case, which the GPU >> can't handle. > > Yeah, it looks like DB can't do swaps, so you need to do the swapping > via TEX or CB. > > This combination of flags tells you that a texture can only be used by > DB and TEX: > "rtex->is_depth && !rtex->is_flushing_texture" > This should be the only category of textures that must be in the GPU > native format because of DB. > > Other textures can be in the CPU native format, because CB and TEX can > do the swapping. > > R600_RESOURCE_FLAG_TRANSFER tells you which texture is a temporary > transfer texture. Such a texture can only be used by CB and TEX. Not > sure if this is useful, because you can already tell: > - which texture is array or packed and set the endian flags accordingly > - which texture can be used by DB and can't set the endian flags > > Marek
Hi Marek, I think your DB analysis is spot on! I hope I can send a new version of the patch-set today without any need for a new flag. Oded _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
