----- Original Message ----- > On 12/05/2011 08:02 PM, Jose Fonseca wrote: > > > > > > ----- Original Message ----- > >> On 11/19/2011 09:44 PM, Marek Olšák wrote: > >>> Hi everyone, > >>> > >>> this patch series implements all the core Mesa and Gallium > >>> support > >>> for EXT_transform_feedback and ARB_transform_feedback2. It's been > >>> tested by me on Radeons and by Christoph Bumiller on Nouveau. I > >>> have verified that all transform feedback piglit tests (except > >>> the > >>> one that requires GLSL 1.3) pass with this code. > >>> > >>> Since we were discussing this last time, the Gallium interface > >>> has > >>> been slightly modified to make it easier to implement by drivers. > >>> The Gallium docs have been updated too. > >>> > >>> Some streamout-related softpipe and llvmpipe code for the old > >>> interface has been disabled. I didn't look at what needs to be > >>> done to support the new one. > >>> > >>> Please review. > >>> > >> > >> > >> No one cares ? Could we just push these now ? > > > > I'm not the ideal person to review (I was not involved in the > > original gallium implementation nor previous discussions about, > > and I don't know much about transform feedback in general), but I > > went through the changes FWIW. > > > > > >> gallium: interface changes necessary to implement transform > >> feedback (v4) > > [...] > > > >> The most notable change is the stream-out info must be linked > >> with a vertex or geometry shader and cannot be set > >> independently. > >> This is due to limitations of existing hardware (special shader > >> instructions must be used to write into stream-out buffers), > >> and it's also how OpenGL works (stream outputs must be > >> specified > >> prior to linking shaders). > > > > Is this true for all current gallium hardware? > > > > To varying degree, I'd say. > On nv50 and nvc0, stream output state can be set independently, but > it > would have to be linked ad-hoc, since if state wasn't created > together > with the shader you wouldn't know the hardware output slots that have > been chosen. > > In D3D10/11, stream output state is always specified through a > geometry > shader, but you can pass the code of a vertex shader on creation of > the > GS and then the implementation can either create a pass-through GS, > or, > if it can do stream output without GS active, just treat it as a > stream > output state object associated with a specific vertex shader. > Either way, stream output is always coupled with a shader object, > too. > > For Radeons you have to add extra instructions (depending on the SO > state) to the shader if it's supposed to stream output, so the OpenGL > way seems nicer for them.
Thanks for the detailed explanation. > I'd avoid having to rebuild/modify shader code on the fly wherever > possible, and using a passthrough GS probably costs enough > performance > to avoid it as well. Yes, makes sense. > > > > @@ -164,9 +170,28 @@ struct pipe_clip_state > > }; > > > > > > +/** > > + * Stream output for vertex transform feedback. > > + */ > > +struct pipe_stream_output_info > > +{ > > + /** number of the output buffer to insert each element into */ > > + unsigned output_buffer[PIPE_MAX_SHADER_OUTPUTS]; > > + /** which register OUT[i] to grab each output from */ > > + unsigned register_index[PIPE_MAX_SHADER_OUTPUTS]; > > + /** TGSI_WRITEMASK signifying which components to output */ > > + ubyte register_mask[PIPE_MAX_SHADER_OUTPUTS]; > > + /** number of outputs */ > > + unsigned num_outputs; > > > > > > + /** stride for an entire vertex, only used if all > > output_buffers are 0 */ > > + unsigned stride; > > +}; > > + > > > > This structure layout could be more space efficient. May be: > > > > unsigned stride; > > unsigned num_outputs; num_outputs should be first actually, as I suppose stride is irrelevant when num_outputs == 0. > > struct foo { > > unsigned output_buffer:???; > > unsigned register_index:???; > > unsigned register_mask:4; > > } [PIPE_MAX_SHADER_OUTPUTS]; > > > > Especially if less than 30bit ints are sufficient for > > register_index and output_bffer. This would allow to easily > > copy/hash only a portion of the state. > > > > 2 bits for buffer and 5 for register_index would satisfy > PIPE_MAX_OUTPUTS (32) and PIPE_MAX_SO_BUFFERS (4), so we're a long > way > from exhausting all 32 bits. This should be done then. > > + > > struct pipe_shader_state > > { > > const struct tgsi_token *tokens; > > + struct pipe_stream_output_info stream_output; > > }; > > > > Is this really necessary for all shader types? Would it be enough > > to mandate geometry shader, and add a new parameter to > > create_gs_state? Or maybe create a separate pipe_shader_state > > structure for this. Either wat looks cleaner and more > > light-weight than taxing all current pipe_shader_state users with > > this stage specific info. > > > > Necessary for all but COMPUTE and FRAGMENT (so, for VERTEX, HULL, > DOMAIN > and GEOMETRY, though I haven't checked whether nvc0 can actually run > with only a single one of HULL/DOMAIN). OK. That's the majority. In that case I'm not sure it's worth to do things differently for fragment/compute anymore. So let's leave it at this. > > BTW, there was at one point the plan to remove "struct > > pipe_shader_state", and have shader state to be simply "struct > > tgsi_token *". Even if we drop that plan now, I think that were > > several places that make this assumption and dropped the contents > > of pipe_shader_state, so reference to pipe_shader_state needs to > > be audited. But we really need to make sure that all pipe_shader_state users fill/preserve the stream_output member, in particular, I noticed these are missing: - state trackers in src/gallium/state_trackers/... were not updated to zero stream_output member - src/gallium/auxiliary/postprocess/pp_run.c ditto - src/gallium/drivers/trace/ & src/gallium/auxiliary/util/u_dump_state.c should be updated to dump the new stream_output member After these updates and the above compacting of pipe_stream_output_info I'm comfortable with the gallium changes. And, independently of this change, the (half dead) cso_context code for caching shaders should be axed. It never quite worked, and much less now. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev