On Mon, Apr 18, 2016 at 6:03 PM, Ilia Mirkin <[email protected]> wrote: > On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <[email protected]> wrote: >> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <[email protected]> wrote: >>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <[email protected]> wrote: >>>>> Wouldn't it make more sense to handle such issues in transfer_map? >>>>> (i.e. create a staging memory area, and decode into it)? This assumes >>>>> that the transfer_map() call has enough information to "do the right >>>>> thing". I don't think it does today, but perhaps it could be taught? >>>> It doesn't have all the info today, that's for sure. I imagine though >>>> we can add parameters to it. >>>> >>>>> That way everything that's in a pipe_resource is in some >>>>> tightly-controlled format, and we specify the LE <-> BE parameters >>>>> when converting between CPU-read/written and GPU-read/written data. I >>>>> believe this is a better match for what's really happening, too. What >>>>> do you think? >>>>> >>>>> -ilia >>>> >>>> Unless I'm missing something, I think, at the end of the day, it will >>>> be the same issues as in my solution - per code path per format is a >>>> different case. That's because you will still need to "teach" >>>> transfer_map, per each transfer per format what to do. So one will >>>> need to go and debug every single code path there is in mesa for >>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5 >>>> months. It's a great learning experience but it won't give anything >>>> generic. >>>> >>>> Again, for example, in st_ReadPixels, I imagine you will need to give >>>> "different orders" to transfer_map for the two different scenarios - >>>> H/W blit and fallback. So what's the gain here ? >>>> >>>> If I'm missing something, please tell me. >>> >>> One of us is... let's figure out which one :) >>> >>> Here's my proposal: >>> >>> All data stored inside of resources is stored in a driver-happy >>> format. The driver ensures that it's stored in proper endianness, etc. >>> (Much like it does today wrt proper stride.) >>> >>> Blitting(/copying) between resources doesn't require any additional >>> information, since you have the format(s) of the respective resources, >>> and it's all inside the driver, so the driver does whatever it needs >>> to do to make it all "work". >>> >>> *Accessing and modifying* resources (directly) from the CPU is what >>> becomes tricky. The state tracker may have incorrect expectations of >>> the actual backing data. There are a few different ways to resolve >>> this. The one I'm proposing is that you only ever return a pointer to >>> the directly underlying data if it matches the CPU's expectations >>> (which will only be the case for byte-oriented array formats like >>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g. >>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through >>> a bounce buffer. >>> >>> At transfer map time, you convert the data from GPU-style to >>> CPU-style, and copy back the relevant bits at unmap/flush time. >>> >>> This presents a nice clean boundary for this stuff. Instead of the >>> state tracker trying to guess what the driver will do and feeding it >>> endiannesses that it can't possibly guess properly, the tracking logic >>> is relegated to the driver, and we extend the interfaces to allow the >>> state tracker to access the data in a proper way. >>> >>> I believe the advantage of this scheme is that beyond adding format >>> parameters to pipe_transfer_map() calls, there will not need to be any >>> adjustments to the state trackers. >>> >>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it >>> maps a buffer, it's formatless (at map time), and yet the GPU will be >>> required to interpret it correctly. We could decree that PIPE_BUFFER >>> is just *always* an array of R8_UNORM and thus never needs any type of >>> swapping. The driver needs to adjust accordingly to deal with accesses >>> that don't fit that pattern (and where parameters can't be fed to the >>> GPU to interpret it properly). >>> >>> I think something like the above will work. And I think it presents a >>> cleaner barrier than your proposal, because none of the "this GPU can >>> kinda-sorta understand BE, but not everywhere" details are ever >>> exposed to the state tracker. >>> >>> Thoughts? >>> >>> -ilia >> >> Ilia, >> >> To make the GPU do a conversion during blitting, I need to configure >> registers. This is done in a couple of functions in the r600g driver >> (r600_translate_texformat, r600_colorformat_endian_swap, >> r600_translate_colorformat and r600_translate_colorswap). >> >> The problem is that transfer_map/unmap don't call directly to those >> functions. They call other functions which eventually call those 4 >> functions. Among those "other" functions, there are several function >> calls which are *not* in the r600g driver. i.e. we go back to generic >> util functions. For example: >> >> #0 r600_translate_colorformat >> #1 evergreen_init_color_surface >> #2 evergreen_set_framebuffer_state >> #3 util_blitter_custom_depth_stencil >> #4 r600_blit_decompress_depth >> #5 r600_texture_transfer_map >> >> Am I allowed to now pass information from transfer_map/unmap all the >> way down to the 4 functions I mentioned through all these layers as >> additional parameters ? I preferred to put it in pipe_resource as that >> information goes all the way down to those functions, but if I can't >> use that, then what's an acceptable alternative ? >> >> This time, I would like to get an agreement *before* I implement it. > > Probably a good idea. And as issues are investigated, people's > opinions on the "correct" way might shift. Let's think about this... > > So clearly *a* correct way to handle this would be to stop all the > lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g. > R5G6B5 but makes no mention of the byte layout in memory for those 16 > bits. Really what we have right now is a format and an *implicit* > endian ordering, which is the CPU's. But what happens when the CPU and > GPU don't agree? > > There's a path we could take which would be to add an endianness > alongside each format (be it by doubling formats, or an explicit > second field). This would be a very far-reaching change though, and I > doubt you'll want to do it. What we're left with is having a format > and an *implicit* endianness. Which means that the consumers of the > format need to be able to work out the implicit endianness involved. > And the endianness will be GPU endian for regular resources, and CPU > endian for "staging" resources. So it's definitely tempting to stick > the endian thing into a private field of the resource, like Rob is > suggesting - when creating a staging texture in > transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise > set it to gpu endian. And I think this is somewhat similar to your > former approach. > > What do you think? > > -ilia
I don't think I have any other choice but to stick it as a private field, because the endian parameter simple can't go through all the function calls as an additional parameter. The reason is that set_framebuffer_state() function types are called from st_invalidate_state, where I don't have any idea about the "correct" endianess, so I can't add the endian parameter to that function type. The only thing that is propagated through all layers is r600_texture. I'll try to use that. Marek, Michel, Do you think it is OK to add the endian mark to that private structure ? Oded _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
