On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <[email protected]> wrote: > On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <[email protected]> wrote: >> On Mon, Apr 18, 2016 at 6:03 PM, >> Ilia Mirkin <[email protected]> wrote: >>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <[email protected]> wrote: >>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <[email protected]> wrote: >>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <[email protected]> >>>>> wrote: >>>>>>> Wouldn't it make more sense to handle such issues in transfer_map? >>>>>>> (i.e. create a staging memory area, and decode into it)? This assumes >>>>>>> that the transfer_map() call has enough information to "do the right >>>>>>> thing". I don't think it does today, but perhaps it could be taught? >>>>>> It doesn't have all the info today, that's for sure. I imagine though >>>>>> we can add parameters to it. >>>>>> >>>>>>> That way everything that's in a pipe_resource is in some >>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters >>>>>>> when converting between CPU-read/written and GPU-read/written data. I >>>>>>> believe this is a better match for what's really happening, too. What >>>>>>> do you think? >>>>>>> >>>>>>> -ilia >>>>>> >>>>>> Unless I'm missing something, I think, at the end of the day, it will >>>>>> be the same issues as in my solution - per code path per format is a >>>>>> different case. That's because you will still need to "teach" >>>>>> transfer_map, per each transfer per format what to do. So one will >>>>>> need to go and debug every single code path there is in mesa for >>>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5 >>>>>> months. It's a great learning experience but it won't give anything >>>>>> generic. >>>>>> >>>>>> Again, for example, in st_ReadPixels, I imagine you will need to give >>>>>> "different orders" to transfer_map for the two different scenarios - >>>>>> H/W blit and fallback. So what's the gain here ? >>>>>> >>>>>> If I'm missing something, please tell me. >>>>> >>>>> One of us is... let's figure out which one :) >>>>> >>>>> Here's my proposal: >>>>> >>>>> All data stored inside of resources is stored in a driver-happy >>>>> format. The driver ensures that it's stored in proper endianness, etc. >>>>> (Much like it does today wrt proper stride.) >>>>> >>>>> Blitting(/copying) between resources doesn't require any additional >>>>> information, since you have the format(s) of the respective resources, >>>>> and it's all inside the driver, so the driver does whatever it needs >>>>> to do to make it all "work". >>>>> >>>>> *Accessing and modifying* resources (directly) from the CPU is what >>>>> becomes tricky. The state tracker may have incorrect expectations of >>>>> the actual backing data. There are a few different ways to resolve >>>>> this. The one I'm proposing is that you only ever return a pointer to >>>>> the directly underlying data if it matches the CPU's expectations >>>>> (which will only be the case for byte-oriented array formats like >>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g. >>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through >>>>> a bounce buffer. >>>>> >>>>> At transfer map time, you convert the data from GPU-style to >>>>> CPU-style, and copy back the relevant bits at unmap/flush time. >>>>> >>>>> This presents a nice clean boundary for this stuff. Instead of the >>>>> state tracker trying to guess what the driver will do and feeding it >>>>> endiannesses that it can't possibly guess properly, the tracking logic >>>>> is relegated to the driver, and we extend the interfaces to allow the >>>>> state tracker to access the data in a proper way. >>>>> >>>>> I believe the advantage of this scheme is that beyond adding format >>>>> parameters to pipe_transfer_map() calls, there will not need to be any >>>>> adjustments to the state trackers. >>>>> >>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it >>>>> maps a buffer, it's formatless (at map time), and yet the GPU will be >>>>> required to interpret it correctly. We could decree that PIPE_BUFFER >>>>> is just *always* an array of R8_UNORM and thus never needs any type of >>>>> swapping. The driver needs to adjust accordingly to deal with accesses >>>>> that don't fit that pattern (and where parameters can't be fed to the >>>>> GPU to interpret it properly). >>>>> >>>>> I think something like the above will work. And I think it presents a >>>>> cleaner barrier than your proposal, because none of the "this GPU can >>>>> kinda-sorta understand BE, but not everywhere" details are ever >>>>> exposed to the state tracker. >>>>> >>>>> Thoughts? >>>>> >>>>> -ilia >>>> >>>> Ilia, >>>> >>>> To make the GPU do a conversion during blitting, I need to configure >>>> registers. This is done in a couple of functions in the r600g driver >>>> (r600_translate_texformat, r600_colorformat_endian_swap, >>>> r600_translate_colorformat and r600_translate_colorswap). >>>> >>>> The problem is that transfer_map/unmap don't call directly to those >>>> functions. They call other functions which eventually call those 4 >>>> functions. Among those "other" functions, there are several function >>>> calls which are *not* in the r600g driver. i.e. we go back to generic >>>> util functions. For example: >>>> >>>> #0 r600_translate_colorformat >>>> #1 evergreen_init_color_surface >>>> #2 evergreen_set_framebuffer_state >>>> #3 util_blitter_custom_depth_stencil >>>> #4 r600_blit_decompress_depth >>>> #5 r600_texture_transfer_map >>>> >>>> Am I allowed to now pass information from transfer_map/unmap all the >>>> way down to the 4 functions I mentioned through all these layers as >>>> additional parameters ? I preferred to put it in pipe_resource as that >>>> information goes all the way down to those functions, but if I can't >>>> use that, then what's an acceptable alternative ? >>>> >>>> This time, I would like to get an agreement *before* I implement it. >>> >>> Probably a good idea. And as issues are investigated, people's >>> opinions on the "correct" way might shift. Let's think about this... >>> >>> So clearly *a* correct way to handle this would be to stop all the >>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g. >>> R5G6B5 but makes no mention of the byte layout in memory for those 16 >>> bits. Really what we have right now is a format and an *implicit* >>> endian ordering, which is the CPU's. But what happens when the CPU and >>> GPU don't agree? >>> >>> There's a path we could take which would be to add an endianness >>> alongside each format (be it by doubling formats, or an explicit >>> second field). This would be a very far-reaching change though, and I >>> doubt you'll want to do it. What we're left with is having a format >>> and an *implicit* endianness. Which means that the consumers of the >>> format need to be able to work out the implicit endianness involved. >>> And the endianness will be GPU endian for regular resources, and CPU >>> endian for "staging" resources. So it's definitely tempting to stick >>> the endian thing into a private field of the resource, like Rob is >>> suggesting - when creating a staging texture in >>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise >>> set it to gpu endian. And I think this is somewhat similar to your >>> former approach. >>> >>> What do you think? >>> >>> -ilia >> >> I don't think I have any other choice but to stick it as a private >> field, because the endian parameter simple can't go through all the >> function calls as an additional parameter. The reason is that >> set_framebuffer_state() function types are called from >> st_invalidate_state, where I don't have any idea about the "correct" >> endianess, so I can't add the endian parameter to that function type. >> >> The only thing that is propagated through all layers is r600_texture. >> I'll try to use that. >> >> Marek, Michel, >> Do you think it is OK to add the endian mark to that private structure ? > > Yes, but doesn't util_format_description::is_array provide that info already? > > Marek
If you mean to say that they are interchangeable, than no. It's true that for array formats we *never* need to do swaps, but for non-array formats there are cases where we need to do it and cases where we don't need to do it. For example, when writing to depth buffer through st_DrawPixels (glDrawPixels) with GL_FLOAT, than the format used is PIPE_FORMAT_Z32_FLOAT. In the make_texture() part, we need to configure endian swap for the staging buffer, but after you return from it and go into st_create_texture_sampler_view(), you want the destination texture to be configured without endian swapping. Both those buffers are configured with PIPE_FORMAT_Z32_FLOAT, so that's why is_array can't be used instead of a dedicated flag. And that's just the example I had now. There are more examples. Oded _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
