Re: [Mesa-dev] Batch buffer sizes, flushing questions

Rogovin, Kevin Thu, 31 Oct 2013 02:23:19 -0700

Hi,

 Thankyou for the detailed answer, and now I have still more questions:


> No.  do_flush_locked() (which is called by intel_batch_buffer_flush()) 
> follows that by calling either drm_intel_bo_mrb_exec() or 
> drm_intel_gem_bo_context_exec().  That's what > causes the batch to be queued 
> for execution.

I think I am getting quite confused by the contents of intel_upload_finish(), 
for it has this:


   53
   54    if (brw->upload.buffer_len) {
   55       drm_intel_bo_subdata(brw->upload.bo,
   56                            brw->upload.buffer_offset,
   57                            brw->upload.buffer_len,
   58                            brw->upload.buffer);
   59       brw->upload.buffer_len = 0;
   60    }
   61
   62    drm_intel_bo_unreference(brw->upload.bo);
   63    brw->upload.bo = NULL;
   64 }

where as the batch buffer is represented by the member brw_context::batch (I 
think). What is the role of brw_context::upload? It looks like the size is 
limited to 4K, so what is it used to upload?

I can see those DRM execution commands in do_flush_locked(). That function's 
implementation is making me a touch confused too, for I see two uploads:


  244
  245    if (brw->has_llc) {
  246       drm_intel_bo_unmap(batch->bo);
  247    } else {
  248       ret = drm_intel_bo_subdata(batch->bo, 0, 4*batch->used, batch->map);
  249       if (ret == 0 && batch->state_batch_offset != batch->bo->size) {
  250      ret = drm_intel_bo_subdata(batch->bo,
  251                     batch->state_batch_offset,
  252                     batch->bo->size - batch->state_batch_offset,
  253                     (char *)batch->map + batch->state_batch_offset);
  254       }
  255    }

I understand the first "uploads batch->used uint32_t's from batch->map to the 
DRM memory object", but I do not quite follow the second upload; what is the 
magicks going on with batch->state_batch_offset and for that matter 
batch->bo->size ??

Going further down, I see that if the command is a blit it uses a different 
execution DRM command. I have not been able to find a reference of what each 
different DRM command does, the best I have found so far are: 
http://lwn.net/Articles/283798/ [Keith Packard's Article/Thread on LWN]  and 
https://www.kernel.org/doc/htmldocs/drm/ ; when I start to dig into the source 
code of DRM for what those functions do, I find they are set as function 
pointers and the chase eventually leads me to some ioctl like calls, but I 
still do not know what they do and the differences. Is there a reference or doc 
saying what these functions are expected to do?


> nr_prims is sometimes != 1 when the client is using the legacy 
> glBegin()/glEnd() technique to emit primitives.  I don't recall the exact 
> circumstances that cause it to happen, but
> here's one example:
>
> glBegin(GL_LINE_STRIP);
> glArrayElement(...);
> ...
> glEnd();
> glBegin(GL_LINE_LOOP);
> glArrayElement(...);
> ...
> glEnd();

That PITA old school begin/end. If the context is core profile, does that then 
imply nr_prims is always 1?


> Not that I'm aware of.  My intuition is that since GL apps typically do a 
> very large number of small-ish draw calls, this wouldn't be beneficial most 
> of the time, and it would be
> tricky to tune the heuristics to make it effective in the rare circumstances 
> where it mattered without sacrificing performance elsewhere.

By small-ish calls, do you mean the batch buffer is small or the vertex or 
fragment load is small? Generally speaking, developers are supposed to keep the 
number of glDrawFoo() calls under 1000 per frame; on embedded they are in for a 
world of hurt if they go over 500 usually, and very often over 300 ends up 
being CPU limited on many embedded platforms. The calls that I am thinking that 
are "heavy"-ish are instanced calls where there are a large number of instances 
of non-trivial geometry, the most typical example is a field of grass.


> drm_intel_bo_busy() will tell if a buffer object is still being used by the 
> GPU.  Also, calling drm_intel_bo_map() on a buffer will cause the CPU to wait 
> until the GPU is done
> with the buffer.  (In the rare cases where we want to map a buffer object 
> without waiting for the GPU we use drm_intel_gem_bo_map_unsynchronized()).

Just to check: are then GL buffer objects and texture surfaces implemented as 
DRM BO's? [Looking at the various functions specified in 
intelInitTextureSubImageFuncs,  intelInitTextureImageFuncs and 
intelInitBufferObjectFuncs makes me guess so, but it still is just a guess].

Looking at intel_bufferobj_subdata(), why does the change of buffer object data 
that is not used only happen async when brw_context::has_llc true?
Also why is preferring to stall more likely to hit that path than the delayed 
data blit?

Best Regards,
-Kevin

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

Reply via email to