On 09/14/2012 07:01 PM, Paul Berry wrote: > On 11 September 2012 12:04, Chad Versace <chad.vers...@linux.intel.com>wrote:
>> + if (!image->mt || >> + image->mt->region->tiling != I915_TILING_X) { >> + /* The algorithm below is written only for X-tiled memory. >> + * >> + * Memcpy'ing to Y-tiled memory is likely faster when done the >> + * conventional way: that is, map the memory through a fence in >> order to >> + * acquire a linear view of the memory, then memcpy one row at a >> time. >> + * Due to the arrangement of Y-tiled memory, if we were to upload >> + * a Y-tiled buffer using the algorithm below, then we could only >> copy >> + * one byte at a time. >> > > Nit pick: actually Y tiling only rearranges octwords, so you could copy 16 > bytes at a time for y tiling. Still might not be worth it but one of these > days we should probably investigate. In v3, I will remove the incorrect comments about Y-tiling, and thus reduce the comment to just the first line: "The algorithm below is written only for X-tiled memory." >> + /* In the tiling algorithm below, some variables are in units of >> pixels, >> + * others are in units of bytes, and others (such as height) are >> unitless. >> + * Each variable name is either prefixed or suffixed with its units. >> + */ >> > > Thank you for doing this--it makes the code a lot easier to follow. It > looks like everything uses suffixing except pixel_max_x and pixel_max_y. > Could we just change those to max_x_pixels and max_y_pixels? Also, for > consistency it would be nice to rename tile_height to tile_height_pixels. pixel_x, pixel_y, byte_offset_y, and byte_offset also used prefixing, just as pixel_max_[xy]_pixels do. In v3, I will consistently use prefixing for all. tile_height is unitless. For example, right: tile_size_bytes = tile_width_bytes * tile_height wrong: tile_size_bytes2 = tile_width_bytes * tile_height_bytes >> + uint8_t *map = bo->virtual >> + + image_offset_y * stride_bytes >> + + image_offset_x * cpp; >> > > I don't think this will work. The way the miplevel image offsets work is > that the parts of the mipmap other than level/layer zero are offset within > the existing X tiling pattern. What you're doing is offsetting the origin > of the map assuming no tiling, and then applying the X tiling pattern to > the offset image. In the best case you'll just wind up putting data at the > wrong location. In the worst case there's a danger of going outside the > memory allocated for the texture. Ouch! Thanks for catching that. In v3, I will simply restrict this function to level 0 of BGRA 2D textures, thus avoiding image offsets altogether. Such a restriction won't diminish the benefit that this patch gives to Chrome OS performance, and that's the main reason I wrote this patch. Sure, it would be nice if this fastpath supported more texture image types, and we can add that additional support in the future, but right now I just want to get concrete improvements for Chrome OS commited to master. >> + intptr_t byte_offset = byte_offset_y + byte_offset_x; >> + if (intel->has_swizzling) { >> + bool bit6 = (byte_offset >> 6) & 1; >> + bool bit9 = (byte_offset >> 9) & 1; >> + bool bit10 = (byte_offset >> 10) & 1; >> + >> + byte_offset += 64 * (bit9 ^ bit10) * (1 - 2 * bit6); >> > > Would this be faster? > > byte_offset ^= ((byte_offset >> 3) ^ (byte_offset >> 4)) & (1 << 6); Clever. I'll take it. --- Thanks for the careful review. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev