On Sun, Aug 15, 2010 at 6:46 AM, keith whitwell <keith.whitw...@gmail.com> wrote: > On Fri, Aug 13, 2010 at 5:25 PM, Chia-I Wu <olva...@gmail.com> wrote: >> On Fri, Aug 13, 2010 at 11:35 PM, Keith Whitwell <kei...@vmware.com> wrote: >>> On Fri, 2010-08-13 at 08:09 -0700, Chia-I Wu wrote: >>>> On Fri, Aug 13, 2010 at 10:51 PM, Keith Whitwell <kei...@vmware.com> wrote: >>>> > On Fri, 2010-08-13 at 07:46 -0700, Chia-I Wu wrote: >>>> >> On Fri, Aug 13, 2010 at 10:14 PM, Keith Whitwell <kei...@vmware.com> >>>> >> wrote: >>>> >> > On Fri, 2010-08-13 at 07:04 -0700, Chia-I Wu wrote: >>>> >> >> Hi, >>>> >> >> >>>> >> >> There are two primitive transformations in gallium draw module. In >>>> >> >> varray, primitives are "split"ted. When a primitive has more >>>> >> >> vertices >>>> >> >> than the middle end can handle, varray splits the primitive and calls >>>> >> >> the middle end multiple times. >>>> >> >> >>>> >> >> In vcache, primitives are "decompose"d. More advanced primitives are >>>> >> >> decomposed into one of point, line(_adj), or triangle(_adj). >>>> >> >> Similarly, vcache may call the middle end multiple times to flush its >>>> >> >> internal buffer. In some cases, vcache passes the primitves through >>>> >> >> without decomposing nor splitting, as can be seen in >>>> >> >> vcache_check_run. >>>> >> >> >>>> >> >> The issue with vcache is that it has to decompose a primitive >>>> >> >> differently depending on the provoking convention, as explained in >>>> >> >> >>>> >> >> >>>> >> >> http://lists.freedesktop.org/archives/mesa-dev/2010-August/001797.html >>>> >> >> >>>> >> >> It becomes a problem when GS is active. >>>> >> >> >>>> >> >> My proposal is to make vcache split instead of decompose. Because >>>> >> >> varray only splits and vcache has a pass-through path, the rest of >>>> >> >> the >>>> >> >> workflow already has to support all primitive types. Switching from >>>> >> >> decompose to split does not require a big change to the rest of the >>>> >> >> workflow. >>>> >> >> >>>> >> >> But then vcache will look a lot like varray, only with indexed >>>> >> >> primitive support. It leads me to a new frontend that replaces both >>>> >> >> varray and vcache: vsplit >>>> >> >> >>>> >> >> http://cgit.freedesktop.org/~olv/mesa/log/?h=draw-vsplit >>>> >> >> >>>> >> >> vsplit is based on varray. It uses some code from vcache to support >>>> >> >> indexed primitives. When vcache decomposes, there are flags being >>>> >> >> set >>>> >> >> to indicate that if the stipple counter should be reset or if some >>>> >> >> edge of a triangle should be omitted in unfilled mode. The segments >>>> >> >> of a splitted primitive have flags for similar purposes too: >>>> >> >> >>>> >> >> DRAW_SPLIT_AFTER More segments to come after this one >>>> >> >> DRAW_SPLIT_BEFORE There are preceding segments >>>> >> >> >>>> >> >> These flags are set by vsplit and the middle ends pass them to the >>>> >> >> other stages. Therefore, the run methods of middle ends are >>>> >> >> augmented >>>> >> >> to take the flags. >>>> >> >> >>>> >> >> To summarize, vsplit >>>> >> >> >>>> >> >> - fixes GS when (flatshade && flatshade_first) is on >>>> >> >> - never sends more vertices than the middle end claims to handle >>>> >> >> - is faster than vcache: split instead of decompose, no get_elt >>>> >> >> calls >>>> >> >> - no longer uses the higher bits of draw_elts for stipple/edge flags >>>> >> >> >>>> >> >> Suggestions? >>>> >> > >>>> >> > >>>> >> > Hi - I haven't looked at the patches yet, but a couple of questions: >>>> >> > >>>> >> > How does this interact with the draw_pipe_* code - which requires >>>> >> > decomposed primitives? >>>> >> draw_pipe.c decomposes the primitives. It is there before because it >>>> >> has to support varray and vcache_check_run which do not decompose. >>>> > >>>> > OK. >>>> > >>>> >> > How does this cope with indexed rendering where the vertex buffers >>>> >> > themselves are too large (for hardware or some other entity)? Eg. >>>> >> > imagine the hardware could cope with up to 64k vertices, and you have >>>> >> > a >>>> >> > drawelements call randomly referencing vertices in range 0..128k ? >>>> >> Vertex fetching happens in the middle end so the range of the indices >>>> >> is not a problem. Though vsplit guarantees that it never calls the >>>> >> middle end with more vertices than the middle end claims to support >>>> >> (as returned by draw_pt_middle_end::prepare). The limit is usually >>>> >> decidied by the size of the buffer for vertex emitting. >>>> > >>>> > I guess I'm wondering how it does this. If the middle end says it >>>> > supports 64k vertices, and the vertex element looks like >>>> > >>>> > [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >>>> > >>>> > what gets sent? (Sorry, I still haven't looked at the code, you could >>>> > well have addressed this). >>>> I see. The frontend would set >>>> >>>> fetch_elts = [0, 128k, 64k, 32k, 96k, 16k, 1, ... ] >>>> draw_elts = [0, 1, 2, 3, 4, 5, 6, ...] >>>> >>>> fetch_elts is processed by the middle end and it will fetch the given >>>> vertices. draw_elts will be passed to draw_emit or the pipeline. It >>>> is the new index buffer, which indexes into the fetched vertices. >>>> >>>> It is actual the same as vcache. So when fetch_elts is >>>> >>>> [0, 128k, 64k, 64k, 128k, 16k, ...], >>>> >>>> draw_elts would be set to >>>> >>>> [0, 1, 2, 2, 1, 3, ...] >>>> >>>> The number of elements to fetch (and shade) is minimized. >>> >>> Thanks Chia-I, I've taken a look at the code & this makes sense - the >>> fetch/draw cache is still there, but specialized into 4 versions for >>> each element type. And it seems like you take some steps not to hit it >>> unnecessarily. >>> >>> I'm coming up to speed on it though, so a couple more questions - for >>> fan primitives, it seems like you always end up in the segment_cache >>> code -- is that true, or is there a fastpath I missed? In particular, >>> if the whole fan fits within the limits of the middle end, will it still >>> end up going through the cache? >> Yes, if it exceeds vsplit's limit (SEGMENT_SIZE). >>> Actually it looks like this happens in an early-out at the bottom of the >>> patch: >>> >>> >>> + /* no splitting required */ >>> + if (count <= max_count_simple) { >>> + SEGMENT_SIMPLE(0x0, start, count); >>> + } >>> >>> >>> where max_count_simple is either >>> >>> vsplit->max_vertices >>> or >>> vsplit->segment_size (for indexed primitives) >>> >>> These in turn are generated as: >>> >>> + middle->prepare(middle, vsplit->prim, opt, &vsplit->max_vertices); >>> + >>> + vsplit->segment_size = MIN2(SEGMENT_SIZE, vsplit->max_vertices); >>> >>> and SEGMENT_SIZE is 1024. >>> >>> >>> So any indexed primitive where the number of vertices (or is it number >>> of indices) exceeds 1024, will end up on the cache path? >>> I know this used to be true as well -- just wondering if there is a way >>> to improve on this... >> max_count_simple is set to the segment size (<= 1024) because the >> middle end expects draw_elts to be of type ushort. vsplit needs to >> use its internal fixed-size buffer when the index_size!=2. >> >> The limit may be lifted for index_size==2. The attached patch should >> relax the limit (untested as it is getting late here :-). Another way >> that comes to my mind now is to make the internal buffer dynamically >> sized, and make SEGMENT_SIZE a large limit on the dynamic size. >> > I think this all makes a great followon change, but as a first step > vsplit looks very nice - a welcome cleanup of the existing code. Great. I've committed the branch to master.
-- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev