On Mon, Jan 31, 2011 at 3:17 PM, Keith Whitwell <kei...@vmware.com> wrote:
> On Sat, 2011-01-29 at 15:12 -0800, Marek Olšák wrote: > > > > > > Hi, > > > > I am proposing to add a pointer to a user buffer in pipe_resource. > > There are two reasons for this: > > > > 1) I would like to have a way to query outside of a driver whether a > > buffer is a user buffer. Simply comparing the pointer with NULL would > > do the trick. > > > > 2) I would like to efficiently obtain a pointer to a user buffer > > outside of a driver without going through the sequence of functions > > get_transfer, transfer_map, transfer_unmap, and transfer_destroy. > > > > This will allow to move more driver-specific code to auxiliary/util. > > > > > > > Marek, > > I have to say my preference would have been to see user buffers fade > away in favour of things like inline transfers. That said you're much > more active than I am in looking at this right now, so I don't want to > get in the way of your progress. > > I guess my biggest problem with user buffers is how poorly defined their > semantics are. For instance, what does it really mean to get write > transfer into a userbuffer? Will you be updating the original > application-owned memory? > Hi Keith, >From my point of view, user buffers are just pointers passed through the Gallium interface and are well-defined from what I can see. They might be owned by the application (e.g. set via glVertexPointer etc.), therefore using the transfer functions on user buffers is invalid per se. Moreover, the application may change the content of user buffers any time, meaning that drivers should convert the user buffers to real buffers in the draw_vbo function, then draw vertices, and then forget the real buffers, keeping the user buffers bound for the next draw operation. Drivers should not upload user buffers anywhere else, because the application may change the contents between glDraw* calls. If they are bound as vertex buffers, we don't need to know their size and sometimes we even can't (again, glVertexPointer etc.). Instead, we can use pipe_draw_info::min_index and max_index and upload only that range. This has proved to be a great optimization and it's how r300g and r600g work now. In theory, doing user buffer uploads at the state tracker side using inline transfers might work and should remove some burden from drivers. In practice, inline transfers may have a very large overhead compared to how things work now. In transfer_inline_write, you usually want to map the buffer, do a memcpy, and unmap it. The map/unmap overhead can be really significant. There are applications that use glDrawElements to draw one triangle at a time, and they draw hundreds of triangles with user buffers in this way (yes, applications really do this). We can't afford doing any more work than is absolutely necessary. When you get 10000 or more draw_vbo calls per second, everything matters. Currently, the radeon drivers have one upload buffer for vertices and it stays mapped until the command stream is flushed. When they get a user buffer, they do one memcpy and that's all. They don't touch winsys unless the upload buffer is full. > And user-buffers tend not to stay user-buffers - they can be promoted to > regular buffers behind the scenes by the driver. Would that be > reflected in this interface somehow? > I don't think it's needed. The pipe_resource fields can stay immutable and drivers can internally replace vertex buffers with their private pipe_resources. The state trackers don't need to know about it. > From the point of view of recording, replaying, debugging, remoting, > etc. at the gallium boundary, it's preferable if all actions are > interposable - ie. all actions are mediated by a function call of some > sort into the gallium interface. Giving a component a direct memory > access into buffer contents would tend to defeat that and make > record/replay of that action difficult. > Indeed, record/replay would be difficult but not impossbie. FWIW I think the interface shouldn't be specifically designed for record/replay. Instead, record/replay should be made work with whatever interface there is. > Is it possible to get a description of what you're doing at a slightly > higher level to try and understand if there's a solution without these > drawbacks? > I am quite content with the current interface for user buffers, but it will need a few changes for it to be more... efficient. Below is my plan for further improving Gallium and its interactions with user buffers in general. 1) New vertex buffer manager This is how I'd like to put the burden of uploading user buffers out of the drivers. I've made a new vertex buffer manager. It can be found here: http://cgit.freedesktop.org/~mareko/mesa/commit/?h=vbuf-mgr&id=94a53b672dd238e6a50bb6b252614dc2e9f30ddf And the corresponding branch is here: http://cgit.freedesktop.org/~mareko/mesa/log/?h=vbuf-mgr It's a module that drivers can use and it does 2 things: - uploads user buffers - takes care of converting unsupported vertex formats and unaligned vertex layouts to supported ones (there are vertex fetcher capability bits, see struct u_vbuf_caps) Besides some typos in a few commits, this work is already done. With this manager, the drivers don't have to deal with user buffers when they are bound as vertex buffers. They only get real hardware buffers. The drivers don't even have to deal with unsupported vertex formats. Moreover, this code has already proven to be fast when it was maturing in r300g and is specifically optimized for low overhead. Its integration to a driver is easy and straithforward. But I need the pipe_resource::user_ptr variable to be able to access the user buffer memory in util (efficiently). 2) Optimizing vertex array state changes in st/mesa Because we don't need to know the sizes of user vertex buffers (as I said, we can use pipe_draw_info::min_index and max_index instead, as is done in the vertex buffer manager), we can remove the calculation of the buffer sizes from st_draw_vbo. We can also remove pipe_vertex_buffer::max_index (again, the information provided by pipe_draw_info is sufficient) and the related code from st/mesa. Not only will this simplify st_draw_vbo, it will allow us to bind vertex buffers and a vertex elements state only when needed, i.e. when either the _NEW_ARRAY or _NEW_PROGRAM flag is set. It makes this usage case a lot faster: for (i = 0; i < 1000; i++) glDrawElements(...); This work is *almost* complete and can be found here: http://cgit.freedesktop.org/~mareko/mesa/log/?h=gallium-varrays-optim The framerate in the Torcs game goes from 14 to 20 fps with this code (in one particular scenario I was optimizing for). Since I no longer compute the sizes of user buffers, I have to put ~0 in the 'size' parameter of user_buffer_create and I expect drivers not to use it. r300g, r600g, nv50, nvc0, softpipe, and llvmpipe should be ready for this. Not sure about the other drivers, but it's fixable. -- So this all is my current plan to simplify hardware drivers a bit and add some nice optimizations. Another option would be to move the new vertex buffer manager or something equivalent to the state tracker and remove user buffers from the Gallium interface, but that would be additional work with uncertain performance implications, so I decided not to take this path (for now). Best regards Marek
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev