date:20131031

Re: [Mesa-dev] [PATCH 3/4] glsl: add ir_cache class and functions for shader serialization

2013-10-31 Thread Tapani Pälli


On 10/29/2013 03:48 PM, Paul Berry wrote:
On 29 October 2013 00:53, Tapani > wrote:


On 10/28/2013 10:09 PM, Paul Berry wrote:


+static uint32_t _unique_id(ir_variable *var)
+{
+   char buffer[256];
+   _mesa_snprintf(buffer, 256, "%s_%p", var->name, var);
+   return _mesa_str_checksum(buffer);


Two problems:

1. This is a lot of work to obtain a unique identifier.  The
pointer is already unique; we should just use that.



I was originally using a pointer but then there's the 32 vs 64bit
address problem. I wanted to avoid usage of pointers.



2. There's no guarantee that the checksum of the string will be
unique. Because of the birthday paradox, checksum collisions are
probably going to happen all the time.



I think there is because the pointer address is unique and is used
as part of the string.


Yes, but the checksumming process destroys uniqueness (it has to, 
because the output of the checksum operation is a 32-bit int, and the 
input is a string.  There are more than 2^32 possible strings, so by 
the pigeon hole principle at least one checksum must correspond to 
more than one possible input string).  In the case of 
_mesa_str_checksum() (which is not a particularly good hash from a 
cryptographic perspective) the collisions are pretty easy to come by.  
For example, these were the first two collisions I found by writing a 
quick test program:


   printf("checksum of \"%s\" is 0x%x\n", "foo_003c",
  _mesa_str_checksum("foo_003c"));
   printf("checksum of \"%s\" is 0x%x\n", "foo_01a8",
  _mesa_str_checksum("foo_01a8"));

checksum of "foo_003c" is 0x1360
checksum of "foo_01a8" is 0x1360



OK, I'll put the address itself back in use then.





Instead of going to a lot of work to generate a unique id for
each ir_variable, I think we should just serialize the
ir_variable pointer address.  If we are trying to use the same
binary format for 32-bit and 64-bit builds (see my clarifying
question at the top of this email), we'll need to extend the
pointer to 64 bits before serializing it.

Alternatively, if you want to keep the unique ID's small, we'll
have to use some hashtable mechanism to map each variable to its
own ID.


OK, I will verify if this is a problem. For me it does not seem
like a problem as I'm using the address there.



+}
+
+
+/* data required to serialize ir_variable */
+struct ir_cache_variable_data


As with glsl_type_data, I don't understand why we need to have
this class rather than just adding a serialize() method to the
ir_variable class.


This could be a serialize function in ir_cache_serialize.cpp. I
did not want to touch existing ir classes, I think it helps to
have the cache code separated and in one place. This is of course
debatable, every ir instruction could have serialize() and
unserialize() to deal with this.


My chief worry with separating them is that people will forget to 
update the serialization code when they change the ir class, because 
the thing they need to update will be far away.  This is particularly 
a problem when adding new members to the class (something we do quite 
frequently when implementing new GLSL features).


I understand and I've been worried about this from the maintainability 
point of view as well. I've thought to add some documentation on the 
ir.h file about this.


// Tapani

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Rogovin, Kevin

Hi,

 Thankyou for the detailed answer, and now I have still more questions:

> No.  do_flush_locked() (which is called by intel_batch_buffer_flush()) 
> follows that by calling either drm_intel_bo_mrb_exec() or 
> drm_intel_gem_bo_context_exec().  That's what > causes the batch to be queued 
> for execution.

I think I am getting quite confused by the contents of intel_upload_finish(), 
for it has this:


   53
   54if (brw->upload.buffer_len) {
   55   drm_intel_bo_subdata(brw->upload.bo,
   56brw->upload.buffer_offset,
   57brw->upload.buffer_len,
   58brw->upload.buffer);
   59   brw->upload.buffer_len = 0;
   60}
   61
   62drm_intel_bo_unreference(brw->upload.bo);
   63brw->upload.bo = NULL;
   64 }

where as the batch buffer is represented by the member brw_context::batch (I 
think). What is the role of brw_context::upload? It looks like the size is 
limited to 4K, so what is it used to upload?

I can see those DRM execution commands in do_flush_locked(). That function's 
implementation is making me a touch confused too, for I see two uploads:


  244
  245if (brw->has_llc) {
  246   drm_intel_bo_unmap(batch->bo);
  247} else {
  248   ret = drm_intel_bo_subdata(batch->bo, 0, 4*batch->used, batch->map);
  249   if (ret == 0 && batch->state_batch_offset != batch->bo->size) {
  250  ret = drm_intel_bo_subdata(batch->bo,
  251 batch->state_batch_offset,
  252 batch->bo->size - batch->state_batch_offset,
  253 (char *)batch->map + batch->state_batch_offset);
  254   }
  255}

I understand the first "uploads batch->used uint32_t's from batch->map to the 
DRM memory object", but I do not quite follow the second upload; what is the 
magicks going on with batch->state_batch_offset and for that matter 
batch->bo->size ??

Going further down, I see that if the command is a blit it uses a different 
execution DRM command. I have not been able to find a reference of what each 
different DRM command does, the best I have found so far are: 
http://lwn.net/Articles/283798/ [Keith Packard's Article/Thread on LWN]  and 
https://www.kernel.org/doc/htmldocs/drm/ ; when I start to dig into the source 
code of DRM for what those functions do, I find they are set as function 
pointers and the chase eventually leads me to some ioctl like calls, but I 
still do not know what they do and the differences. Is there a reference or doc 
saying what these functions are expected to do?


> nr_prims is sometimes != 1 when the client is using the legacy 
> glBegin()/glEnd() technique to emit primitives.  I don't recall the exact 
> circumstances that cause it to happen, but
> here's one example:
>
> glBegin(GL_LINE_STRIP);
> glArrayElement(...);
> ...
> glEnd();
> glBegin(GL_LINE_LOOP);
> glArrayElement(...);
> ...
> glEnd();

That PITA old school begin/end. If the context is core profile, does that then 
imply nr_prims is always 1?


> Not that I'm aware of.  My intuition is that since GL apps typically do a 
> very large number of small-ish draw calls, this wouldn't be beneficial most 
> of the time, and it would be
> tricky to tune the heuristics to make it effective in the rare circumstances 
> where it mattered without sacrificing performance elsewhere.

By small-ish calls, do you mean the batch buffer is small or the vertex or 
fragment load is small? Generally speaking, developers are supposed to keep the 
number of glDrawFoo() calls under 1000 per frame; on embedded they are in for a 
world of hurt if they go over 500 usually, and very often over 300 ends up 
being CPU limited on many embedded platforms. The calls that I am thinking that 
are "heavy"-ish are instanced calls where there are a large number of instances 
of non-trivial geometry, the most typical example is a field of grass.


> drm_intel_bo_busy() will tell if a buffer object is still being used by the 
> GPU.  Also, calling drm_intel_bo_map() on a buffer will cause the CPU to wait 
> until the GPU is done
> with the buffer.  (In the rare cases where we want to map a buffer object 
> without waiting for the GPU we use drm_intel_gem_bo_map_unsynchronized()).

Just to check: are then GL buffer objects and texture surfaces implemented as 
DRM BO's? [Looking at the various functions specified in 
intelInitTextureSubImageFuncs,  intelInitTextureImageFuncs and 
intelInitBufferObjectFuncs makes me guess so, but it still is just a guess].

Looking at intel_bufferobj_subdata(), why does the change of buffer object data 
that is not used only happen async when brw_context::has_llc true?
Also why is preferring to stall more likely to hit that path than the delayed 
data blit?

Best Regards,
-Kevin

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] Uniform Buffer Object implementation questions

2013-10-31 Thread Rogovin, Kevin

Hi all,

 I was looking at the UBO implementation for Gen7 and following the trail of 
brw_vs_ubo_surfaces (found in brw_vs_surface_state.c ). The part that made me 
wonder was the following trail:


  834   /* Because behavior for referencing outside of the binding's size 
in the
  835* glBindBufferRange case is undefined, we can just bind the whole 
buffer
  836* glBindBufferBase wants and be a correct implementation.
  837*/
  838   brw->vtbl.create_constant_surface(brw, bo, binding->Offset,
  839 bo->size - binding->Offset,
  840 &surf_offsets[i],
  841 shader->Type == 
GL_FRAGMENT_SHADER);

in  brw_wm_surface_state.c, function brw_upload_ubo_surfaces(). When looking at 
what Gen7 sets create_constant_surface, it is gen7_create_constant_surface 
which has this comment

"Create the constant buffer surface. Vertex/fragment shader constants will be 
read from this buffer with Data Port Read instructions/messages."

I am not yet very familiar with the labels for different parts of the hardware 
or their functionality. So I am guessing here. Is the Data Port Read able to 
essentially read anything in the GTT? Are the values cached? Going further, if 
it can do so, does that not mean that the size of a UBO can be arbitrarily big? 
On other GL implementations, the size of UBO is usually quite limited so that 
the contents of a UBO can essentially be mostly, if not fully, cached. The idea 
was that TBO would be slower but essentially unlimited size and UBO's faster 
(and formatted) but with much less size.

Any help that can point me in the correct would be much appreciated.

Best Regards
 -Kevin Rogovin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Abdiel Janulgue

On Thursday, October 31, 2013 09:22:23 AM Rogovin, Kevin wrote:

> but I do not quite follow the second upload; what
> is the magicks going on with batch->state_batch_offset and for that matter
> batch->bo->size ??

This is stack and heap model for batchbuffer submission. Direct state, which 
is usually composed of the commands, is allocated at the beginning and 
indirect (dynamic) state data is allocated from the end of the batchbuffer. 
The batch->state_batch_offset is the start location of the indirect state data

-Abdiel

> 
> Going further down, I see that if the command is a blit it uses a different
> execution DRM command. I have not been able to find a reference of what
> each different DRM command does, the best I have found so far are:
> http://lwn.net/Articles/283798/ [Keith Packard's Article/Thread on LWN] 
> and https://www.kernel.org/doc/htmldocs/drm/ ; when I start to dig into the
> source code of DRM for what those functions do, I find they are set as
> function pointers and the chase eventually leads me to some ioctl like
> calls, but I still do not know what they do and the differences. Is there a
> reference or doc saying what these functions are expected to do?
> > nr_prims is sometimes != 1 when the client is using the legacy
> > glBegin()/glEnd() technique to emit primitives.  I don't recall the exact
> > circumstances that cause it to happen, but here's one example:
> > 
> > glBegin(GL_LINE_STRIP);
> > glArrayElement(...);
> > ...
> > glEnd();
> > glBegin(GL_LINE_LOOP);
> > glArrayElement(...);
> > ...
> > glEnd();
> 
> That PITA old school begin/end. If the context is core profile, does that
> then imply nr_prims is always 1?
> > Not that I'm aware of.  My intuition is that since GL apps typically do a
> > very large number of small-ish draw calls, this wouldn't be beneficial
> > most of the time, and it would be tricky to tune the heuristics to make
> > it effective in the rare circumstances where it mattered without
> > sacrificing performance elsewhere.
> By small-ish calls, do you mean the batch buffer is small or the vertex or
> fragment load is small? Generally speaking, developers are supposed to keep
> the number of glDrawFoo() calls under 1000 per frame; on embedded they are
> in for a world of hurt if they go over 500 usually, and very often over 300
> ends up being CPU limited on many embedded platforms. The calls that I am
> thinking that are "heavy"-ish are instanced calls where there are a large
> number of instances of non-trivial geometry, the most typical example is a
> field of grass.
> > drm_intel_bo_busy() will tell if a buffer object is still being used by
> > the GPU.  Also, calling drm_intel_bo_map() on a buffer will cause the CPU
> > to wait until the GPU is done with the buffer.  (In the rare cases where
> > we want to map a buffer object without waiting for the GPU we use
> > drm_intel_gem_bo_map_unsynchronized()).
> Just to check: are then GL buffer objects and texture surfaces implemented
> as DRM BO's? [Looking at the various functions specified in
> intelInitTextureSubImageFuncs,  intelInitTextureImageFuncs and
> intelInitBufferObjectFuncs makes me guess so, but it still is just a
> guess].
> 
> Looking at intel_bufferobj_subdata(), why does the change of buffer object
> data that is not used only happen async when brw_context::has_llc true?
> Also why is preferring to stall more likely to hit that path than the
> delayed data blit?
> 
> Best Regards,
> -Kevin
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Rogovin, Kevin



>> but I do not quite follow the second upload; what
>> is the magicks going on with batch->state_batch_offset and for that matter
>> batch->bo->size ??

>This is stack and heap model for batchbuffer submission. Direct state, which
>is usually composed of the commands, is allocated at the beginning and
>indirect (dynamic) state data is allocated from the end of the batchbuffer.
>The batch->state_batch_offset is the start location of the indirect state data

By indirect state, do you mean addresses for things (like shader binary, image 
and buffer object sources)? I admit that I see nothing about the ideas of stack 
(last in is first out) or heap (biggest is out first) about that. What is being 
pushed (stack) or in priority (usual use case for heaps)? 

Assuming it is addresses, is it just putting addresses of the data sources used 
in a draw call within a batch, do those values -need- to be in the end? is that 
need from the hardware or for the kernel? is the offset into those addresses 
also packed into the batch buffer? if so where? also, what is the address space 
of the addresses? GTT values? Usermode values [which means kernel converts]? 
something else?

If it is not addresses, what is it?

-Kevin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Abdiel Janulgue

On Thursday, October 31, 2013 12:38:37 PM Rogovin, Kevin wrote:
> >> but I do not quite follow the second upload; what
> >> is the magicks going on with batch->state_batch_offset and for that
> >> matter
> >> batch->bo->size ??
> >
> >This is stack and heap model for batchbuffer submission. Direct state,
> >which is usually composed of the commands, is allocated at the beginning
> >and indirect (dynamic) state data is allocated from the end of the
> >batchbuffer. The batch->state_batch_offset is the start location of the
> >indirect state data
> By indirect state, do you mean addresses for things (like shader binary,
> image and buffer object sources)? 

Usually, indirect state are objects that are referenced by the commands. A 
shader binary image itself can be classified as indirect state.

> I admit that I see nothing about the
> ideas of stack (last in is first out) or heap (biggest is out first) about
> that. What is being pushed (stack) or in priority (usual use case for
> heaps)?

Stack/heap in process memory management sense. The direct state grows from the 
bottom and the indirect state goes down from the top.

> 
> Assuming it is addresses, is it just putting addresses of the data sources
> used in a draw call within a batch, do those values -need- to be in the
> end? is that need from the hardware or for the kernel? is the offset into
> those addresses also packed into the batch buffer? if so where? also, what
> is the address space of the addresses? GTT values? Usermode values [which
> means kernel converts]? something else?

It depends, for some commands it could be an offset size from a buffer object 
itself. Sometimes they could be virtual addresses (offset from the aperture). 
Then there are commands that require physical addresses, AFAIK, I think only 
the kernel directly uses them, not the userspace...

> 
> If it is not addresses, what is it?
> 
> -Kevin
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: Haswell resource streamer/hw-generated binding tables (v2)

2013-10-31 Thread Abdiel Janulgue

On Thursday, October 10, 2013 01:04:08 PM Eric Anholt wrote:
> 
> My basic comment on resource streamer: We need performance data showing
> that it is a win before we commit it.  I'm not planning on reviewing the
> changes until we get that data.

Okay, I revisited the series, did some additional optimizations where the RS 
was able to cut surface state emission by up to 50% but without requiring a 
separate bo for cache (I need to clean it up first before re-submitting). And 
here finally, some light in the end of the tunnel:

x score_glb2_7_TREX_resource_streamer_ON
+ score_glb2_7_TREX_resource_streamer_OFF

N   Min   MaxMedian   AvgStddev
x  31686969 68.7741940.42502372
+  31686968 68.1935480.40160966
Difference at 95.0% confidence
-0.580645 +/- 0.210049
-0.844278% +/- 0.305419%
(Student's t, pooled s = 0.413482)

++

Anyway, I realized that I have been using GLB which only samples small number 
of surfaces in a shader. It's quite shortsighted. This explains why 
performance results are not very apparent in those benchmarks when enabling 
hw-generated binding tables.

I think the resource streamer can really shine in cases where shaders sample 
much larger number of surfaces since it is able to skip uploading the binding 
table data using the CPU. I'll try to experiment next with benchmarks that has 
shaders requiring larger amounts of sampled surfaces. If you have suggestions 
for certain benchmarks that do this, please let me know!

-abdiel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Paul Berry

On 31 October 2013 02:22, Rogovin, Kevin  wrote:

>Hi,
>
>  Thankyou for the detailed answer, and now I have still more questions:
>
>  > No.  do_flush_locked() (which is called by intel_batch_buffer_flush())
> follows that by calling either drm_intel_bo_mrb_exec() or
> drm_intel_gem_bo_context_exec().  That's what > causes the batch to be
> queued for execution.
>
> I think I am getting quite confused by the contents of
> intel_upload_finish(), for it has this:
>
>
>53
>54if (brw->upload.buffer_len) {
>55   drm_intel_bo_subdata(brw->upload.bo,
>56brw->upload.buffer_offset,
>57brw->upload.buffer_len,
>58brw->upload.buffer);
>59   brw->upload.buffer_len = 0;
>60}
>61
>62drm_intel_bo_unreference(brw->upload.bo);
>63brw->upload.bo = NULL;
>64 }
>
> where as the batch buffer is represented by the member brw_context::batch
> (I think). What is the role of brw_context::upload?
>

The one and only role of brw_context::upload is to handle vertex and index
data that is passed into OpenGL using client pointers by clients that
aren't using ARB_vertex_buffer_object to manaage their vertex and index
data in GPU buffers.  At the time of the draw call, the driver has to copy
any such client-owned data into a newly-allocated buffer object for
consumption by the hardware.  brw->upload.bo is that buffer object.  The
remaining fields in brw->upload are a bookkeeping mechanism to allow small
chunks of vertex and index data to be shared in the same buffer object so
that we don't waste memory.

> It looks like the size is limited to 4K, so what is it used to upload?
>

Actually there are two mechanisms used by intel_upload_data() to store data
in brw->upload.bo, depending on the size of the data being stored.  If the
size is 4k or more, then it is uploaded directly to brw->upload.bo using
drm_intel_bo_subdata().  If it's less than 4k, it is memcpy'ed to
brw->upload.buffer, which acts as a temporary staging area; when
brw->upload.buffer gets full (or it's time to flush the batch), the
contents of brw->upload.buffer are uploaded to brw->upload.bo using
drm_intel_bo_subdata().  This allows us to reduce the overhead of calling
drm_intel_bo_subdata() when uploading tiny chunks of data at a time.

>
> I can see those DRM execution commands in do_flush_locked(). That
> function's implementation is making me a touch confused too, for I see two
> uploads:
>
>
>   244
>   245if (brw->has_llc) {
>   246   drm_intel_bo_unmap(batch->bo);
>   247} else {
>   248   ret = drm_intel_bo_subdata(batch->bo, 0, 4*batch->used,
> batch->map);
>   249   if (ret == 0 && batch->state_batch_offset != batch->bo->size) {
>   250  ret = drm_intel_bo_subdata(batch->bo,
>   251 batch->state_batch_offset,
>   252 batch->bo->size - batch->state_batch_offset,
>   253 (char *)batch->map + batch->state_batch_offset);
>   254   }
>   255}
>
> I understand the first "uploads batch->used uint32_t's from batch->map to
> the DRM memory object", but I do not quite follow the second upload; what
> is the magicks going on with batch->state_batch_offset and for that matter
> batch->bo->size ??
>

It looks like Abdiel answered this question, but to provide some more
context: we actually store more data in the batch buffer than just the
stuff that the hardware docs would describe as "batch commands".  We also
use the batch buffer to store a lot of smaller data structures that are
pointed to by batch commands, such as surface states, binding tables, cc
("color calculator") state, blend state, and so on (the docs typically
refer to this kind of stuff as "dynamic state").  The way we achieve this
is the "stack and heap" model: the batch commands are allocated starting at
the start of the batch buffer and moving towards higher addresses (like the
heap in a conventional UNIX process), and the dynamic state is allocated
starting at the end of the batch buffer and moving towards lower addresses
(like the stack in a conventional UNIX process).  batch->used is the "top
of heap pointer" (the buffer offset where the next batch command will be
written), and batch->state_batch_offset is the "stack pointer" (the buffer
offset where the most recently written piece of dynamic state is located).
When these pointers meet in the middle of the batch, we know we are out of
room in the batch buffer so we flush it.  To confuse you, batch->used is
measured in uint32's, and state_batch_offset is measured in bytes.

Now, to explain the code you've quoted above.  If the hardware has an LLC
("last level cache"), then the way we build this batch buffer is by mapping
the buffer object into CPU memory and writing to it directly.  So when
we're ready to send the batch to the hardware, all we need to do is unmap
it and we're ready to go.  However, if the har

Re: [Mesa-dev] Batch buffer sizes, flushing questions

2013-10-31 Thread Paul Berry

On 31 October 2013 05:38, Rogovin, Kevin  wrote:

>
>
> >> but I do not quite follow the second upload; what
> >> is the magicks going on with batch->state_batch_offset and for that
> matter
> >> batch->bo->size ??
>
> >This is stack and heap model for batchbuffer submission. Direct state,
> which
> >is usually composed of the commands, is allocated at the beginning and
> >indirect (dynamic) state data is allocated from the end of the
> batchbuffer.
> >The batch->state_batch_offset is the start location of the indirect state
> data
>
> By indirect state, do you mean addresses for things (like shader binary,
> image and buffer object sources)? I admit that I see nothing about the
> ideas of stack (last in is first out) or heap (biggest is out first) about
> that. What is being pushed (stack) or in priority (usual use case for
> heaps)?
>
> Assuming it is addresses, is it just putting addresses of the data sources
> used in a draw call within a batch, do those values -need- to be in the
> end? is that need from the hardware or for the kernel?


No, there's no need for them to be at the end.  It's just our little trick
to allow us to use the same buffer object for both batch commands and
indirect state.


> is the offset into those addresses also packed into the batch buffer? if
> so where? also, what is the address space of the addresses? GTT values?
> Usermode values [which means kernel converts]? something else?
>

>From the GPU's point of view, all addresses are relative to the GTT.  The
address of most indirect state is specified using two values: a base
pointer and an offset.  The base pointer is specified using the
STATE_BASE_ADDRESS command, and the offset is specified in whatever command
is trying to establish the state in question (for example, if we're talking
about the scissor rectangle, it's the 3DSTATE_SCISSOR_STATE_POINTERS
command).  Different types of data use different base address pointers, but
for all of the data we're talking about right now, we always set the base
address pointers to the start of the batch buffer.  Then, in the individual
commands, the offset we supply is simply the offset to the data within the
batch buffer.  So, for example, if you look at gen6_upload_scissor_state(),
you'll see that it allocates the indirect state for the scissor rectangle
using brw_state_batch().  That gives it back an offset
(scissor_state_offset) telling it the offset (relative to the batch) where
the indirect state is.  Then it uses BEGIN_BATCH(), OUT_BATCH(), and
ADVANCE_BATCH() to send a 3DSTATE_SCISSOR_STATE_POINTERS command containing
the offset.

The kernel is in charge of deciding where to locate each buffer object in
the GTT, and it decides this after Mesa has submitted the batch.  So Mesa
can't program the STATE_BASE_ADDRESSes directly.  Instead, it gives the
kernel a data structure full of "relocations", which tell the kernel how to
patch up things like the STATE_BASE_ADDRESS command once the GTT addresses
have been decided.  So, for example, if you look at
upload_state_base_address(), you'll see that it outputs the base addresses
using the OUT_RELOC() macro.  This emits a speculative address into the
batch (based on the GTT address that the batch buffer had last time it was
used), and tells the kernel how to fix it up if the batch buffer winds up
getting relocated to a different GTT address.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Tagging the 10.0 fork for git-describe

2013-10-31 Thread Chad Versace


On 10/30/2013 02:16 PM, Paul Berry wrote:

On 30 October 2013 14:13, Ian Romanick  wrote:


On 10/30/2013 01:01 PM, Chad Versace wrote:

For the sake of git-describe, I propose we tag the 10.0 fork
point on master as 'mesa-10.0-fork', 'mesa-10.0-base', or
something similair. How do people feel about that?


I support this idea.

To avoid confusion with tags on the branch, I think the tag should
reflect where master is going.  mesa-10.next?  mesa-10.1-devel?  I'm
open to suggestions.  I'll update the version with the first patch after
the tag.



I like the sound of mesa-10.1-devel.


Me too. I like mesa-10.1-devel.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/6] radeonsi: report our border color behavior

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index 0431ab0..f222f7d 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -357,10 +357,12 @@ static int r600_get_param(struct pipe_screen* pscreen, 
enum pipe_cap param)
case PIPE_CAP_CUBE_MAP_ARRAY:
case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
-   case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
 case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
return 0;
 
+   case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
+   return PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_R600;
+
/* Stream output. */
case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
return has_streamout ? 4 : 0;
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] radeonsi: implement ARB_vertex_type_2_10_10_10_rev

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0d74344..72368d8 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1362,6 +1362,13 @@ static uint32_t si_translate_buffer_dataformat(struct 
pipe_screen *screen,
if (type == UTIL_FORMAT_TYPE_FIXED)
return V_008F0C_BUF_DATA_FORMAT_INVALID;
 
+   if (desc->nr_channels == 4 &&
+   desc->channel[0].size == 10 &&
+   desc->channel[1].size == 10 &&
+   desc->channel[2].size == 10 &&
+   desc->channel[3].size == 2)
+   return V_008F0C_BUF_DATA_FORMAT_2_10_10_10;
+
/* See whether the components are of the same size. */
for (i = 0; i < desc->nr_channels; i++) {
if (desc->channel[first_non_void].size != desc->channel[i].size)
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] radeonsi: implement texture buffer objects

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

GLSL 1.40 is done.
---
 src/gallium/drivers/radeonsi/radeonsi_pipe.c   |  11 ++-
 src/gallium/drivers/radeonsi/radeonsi_shader.c |  64 --
 src/gallium/drivers/radeonsi/si_state.c| 116 +++--
 3 files changed, 137 insertions(+), 54 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c 
b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
index f222f7d..b79a58e 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c
@@ -327,6 +327,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_TGSI_INSTANCEID:
case PIPE_CAP_COMPUTE:
+   case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
return 1;
 
case PIPE_CAP_TEXTURE_MULTISAMPLE:
@@ -342,7 +343,12 @@ static int r600_get_param(struct pipe_screen* pscreen, 
enum pipe_cap param)
return 256;
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
-   return 130;
+   return 140;
+
+   case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
+   return 1;
+   case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
+   return MIN2(rscreen->b.info.vram_size, 0x);
 
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -355,9 +361,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
case PIPE_CAP_CUBE_MAP_ARRAY:
-   case PIPE_CAP_TEXTURE_BUFFER_OBJECTS:
-   case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
-case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
return 0;
 
case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 0e15881..7523cb4 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -1189,13 +1189,34 @@ static void tex_fetch_args(
const struct tgsi_full_instruction * inst = emit_data->inst;
unsigned opcode = inst->Instruction.Opcode;
unsigned target = inst->Texture.Texture;
-   unsigned sampler_src, sampler_index;
LLVMValueRef coords[4];
LLVMValueRef address[16];
int ref_pos;
unsigned num_coords = tgsi_util_get_texture_coord_dim(target, &ref_pos);
unsigned count = 0;
unsigned chan;
+   unsigned sampler_src = emit_data->inst->Instruction.NumSrcRegs - 1;
+   unsigned sampler_index = 
emit_data->inst->Src[sampler_src].Register.Index;
+
+   if (target == TGSI_TEXTURE_BUFFER) {
+   LLVMTypeRef i128 = LLVMIntTypeInContext(gallivm->context, 128);
+   LLVMTypeRef v2i128 = LLVMVectorType(i128, 2);
+   LLVMTypeRef i8 = LLVMInt8TypeInContext(gallivm->context);
+   LLVMTypeRef v16i8 = LLVMVectorType(i8, 16);
+
+   /* Truncate v32i8 to v16i8. */
+   LLVMValueRef res = si_shader_ctx->resources[sampler_index];
+   res = LLVMBuildBitCast(gallivm->builder, res, v2i128, "");
+   res = LLVMBuildExtractElement(gallivm->builder, res, 
bld_base->uint_bld.zero, "");
+   res = LLVMBuildBitCast(gallivm->builder, res, v16i8, "");
+
+   emit_data->dst_type = LLVMVectorType(bld_base->base.elem_type, 
4);
+   emit_data->args[0] = res;
+   emit_data->args[1] = bld_base->uint_bld.zero;
+   emit_data->args[2] = lp_build_emit_fetch(bld_base, 
emit_data->inst, 0, 0);
+   emit_data->arg_count = 3;
+   return;
+   }
 
/* Fetch and project texture coordinates */
coords[3] = lp_build_emit_fetch(bld_base, emit_data->inst, 0, 
TGSI_CHAN_W);
@@ -1267,9 +1288,6 @@ static void tex_fetch_args(
 "");
}
 
-   sampler_src = emit_data->inst->Instruction.NumSrcRegs - 1;
-   sampler_index = emit_data->inst->Src[sampler_src].Register.Index;
-
/* Adjust the sample index according to FMASK.
 *
 * For uncompressed MSAA surfaces, FMASK should return 0x76543210,
@@ -1430,6 +1448,15 @@ static void build_tex_intrinsic(const struct 
lp_build_tgsi_action * action,
struct lp_build_context * base = &bld_base->base;
char intr_name[23];
 
+   if (emit_data->inst->Texture.Texture == TGSI_TEXTURE_BUFFER) {
+   emit_data->output[emit_data->chan] = build_intrinsic(
+   base->gallivm->builder,
+   "llvm.SI.vs.load.input", emit_data->dst_type,
+   emit_data->args, emit_data->arg_count,
+   LLVMReadNoneAttribute | LLVMNoUnwindAttribute);
+   return;
+   }
+
sprintf(intr_name, "%sv%ui32",

[Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/main/texcompress.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
index e71d0c4..7393d3f 100644
--- a/src/mesa/main/texcompress.c
+++ b/src/mesa/main/texcompress.c
@@ -257,11 +257,12 @@ _mesa_get_compressed_formats(struct gl_context *ctx, 
GLint *formats)
if (ctx->Extensions.EXT_texture_compression_s3tc) {
   if (formats) {
  formats[n++] = GL_COMPRESSED_RGB_S3TC_DXT1_EXT;
+ formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT1_EXT;
  formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT3_EXT;
  formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT5_EXT;
   }
   else {
- n += 3;
+ n += 4;
   }
}
 
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/6] docs/GL3: document radeonsi support, minor cleanup

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

---
 docs/GL3.txt | 180 +--
 1 file changed, 90 insertions(+), 90 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index ff28ea6..0408266 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -20,45 +20,45 @@ Feature   Status
 
 GL 3.0:
 
-GLSL 1.30 DONE
+GLSL 1.30 DONE (i965, r600, 
radeonsi)
 glBindFragDataLocation, glGetFragDataLocation DONE
-Conditional rendering (GL_NV_conditional_render)  DONE (i965, r300, r600, 
swrast)
-Map buffer subranges (GL_ARB_map_buffer_range)DONE (i965, r300, r600, 
swrast)
-Clamping controls (GL_ARB_color_buffer_float) DONE (i965, r300, r600)
-Float textures, renderbuffers (GL_ARB_texture_float)  DONE (i965, r300, r600)
-GL_EXT_packed_float   DONE (i965, r600)
-GL_EXT_texture_shared_exponentDONE (i965, r600, swrast)
-Float depth buffers (GL_ARB_depth_buffer_float)   DONE (i965, r600)
-Framebuffer objects (GL_ARB_framebuffer_object)   DONE (i965, r300, r600, 
swrast)
-Half-floatDONE
-Non-normalized Integer texture/framebuffer formatsDONE (i965, r600)
-1D/2D Texture arrays  DONE
-Per-buffer blend and masks (GL_EXT_draw_buffers2) DONE (i965, r600, swrast)
-GL_EXT_texture_compression_rgtc   DONE (i965, r300, r600, 
swrast)
-Red and red/green texture formats DONE (i965, swrast, 
gallium)
-Transform feedback (GL_EXT_transform_feedback)DONE (i965, r600)
-Vertex array objects (GL_APPLE_vertex_array_object)   DONE (i965, r300, r600, 
swrast)
-sRGB framebuffer format (GL_EXT_framebuffer_sRGB) DONE (i965, r600)
+Conditional rendering (GL_NV_conditional_render)  DONE (i965, r300, r600, 
radeonsi, swrast)
+Map buffer subranges (GL_ARB_map_buffer_range)DONE (i965, r300, r600, 
radeonsi, swrast)
+Clamping controls (GL_ARB_color_buffer_float) DONE (i965, r300, r600, 
radeonsi)
+Float textures, renderbuffers (GL_ARB_texture_float)  DONE (i965, r300, r600, 
radeonsi)
+GL_EXT_packed_float   DONE (i965, r600, 
radeonsi)
+GL_EXT_texture_shared_exponentDONE (i965, r600, 
radeonsi, swrast)
+Float depth buffers (GL_ARB_depth_buffer_float)   DONE (i965, r600, 
radeonsi)
+Framebuffer objects (GL_ARB_framebuffer_object)   DONE (i965, r300, r600, 
radeonsi, swrast)
+Half-floatDONE (i965, r300, r600, 
radeonsi, swrast)
+Non-normalized Integer texture/framebuffer formatsDONE (i965, r600, 
radeonsi)
+1D/2D Texture arrays  DONE (i965, r600, 
radeonsi)
+Per-buffer blend and masks (GL_EXT_draw_buffers2) DONE (i965, r600, 
radeonsi, swrast)
+GL_EXT_texture_compression_rgtc   DONE (i965, r300, r600, 
radeonsi, swrast)
+Red and red/green texture formats DONE (i965, r300, r600, 
radeonsi, swrast)
+Transform feedback (GL_EXT_transform_feedback)DONE (i965, r600, 
radeonsi)
+Vertex array objects (GL_APPLE_vertex_array_object)   DONE (all drivers)
+sRGB framebuffer format (GL_EXT_framebuffer_sRGB) DONE (i965, r600, 
radeonsi)
 glClearBuffer commandsDONE
 glGetStringi command  DONE
 glTexParameterI, glGetTexParameterI commands  DONE
 glVertexAttribI commands  DONE
-Depth format cube texturesDONE
+Depth format cube texturesDONE (i965, r600, 
radeonsi)
 GLX_ARB_create_context (GLX 1.4 is required)  DONE
 
 
 GL 3.1:
 
-GLSL 1.40 DONE (i965, r600)
-Forward compatibile context support/deprecations  DONE (i965, r600)
-Instanced drawing (GL_ARB_draw_instanced) DONE (i965, gallium, 
swrast)
-Buffer copying (GL_ARB_copy_buffer)   DONE (i965, r300, r600, 
swrast)
-Primitive restart (GL_NV_primitive_restart)   DONE (i965, r600)
-16 vertex texture image units DONE
-Texture buffer objs (GL_ARB_texture_buffer_object)DONE for OpenGL 3.1 
contexts (i965, r600)
-Rectangular textures (GL_ARB_texture_rectangle)   DONE (i965, r300, r600, 
swrast)
-Uniform buffer objs (GL_ARB_uniform_buffer_object)DONE (i965, r600, swrast)
-Signed normalized textures (GL_EXT_texture_snorm) DONE (i965, r300, r600)
+GLSL 1.40 DONE (i965, r600, 
radeonsi)
+Forward compatibile context support/deprecations  DONE (i965, r600, 
radeonsi)
+Instanced drawing (GL_ARB_draw_instanced) DONE (i965, r600, 
radeonsi, swrast)
+Buffer copying (GL_ARB_copy_buffer)   DONE

[Mesa-dev] [PATCH 3/6] r600g, radeonsi: properly expose texture buffer formats

2013-10-31 Thread Marek Olšák

From: Marek Olšák 

This exposes GL_ARB_texture_buffer_object_rgb32.
---
 src/gallium/drivers/r600/evergreen_state.c | 11 ---
 src/gallium/drivers/r600/r600_state.c  | 11 ---
 src/gallium/drivers/radeonsi/si_state.c| 11 ---
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 4535d21..a4a4e3e 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -711,9 +711,14 @@ boolean evergreen_is_format_supported(struct pipe_screen 
*screen,
}
}
 
-   if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
-   r600_is_sampler_format_supported(screen, format)) {
-   retval |= PIPE_BIND_SAMPLER_VIEW;
+   if (usage & PIPE_BIND_SAMPLER_VIEW) {
+   if (target == PIPE_BUFFER) {
+   if (r600_is_vertex_format_supported(format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   } else {
+   if (r600_is_sampler_format_supported(screen, format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   }
}
 
if ((usage & (PIPE_BIND_RENDER_TARGET |
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index f148052..41e9c5d 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -649,9 +649,14 @@ boolean r600_is_format_supported(struct pipe_screen 
*screen,
}
}
 
-   if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
-   r600_is_sampler_format_supported(screen, format)) {
-   retval |= PIPE_BIND_SAMPLER_VIEW;
+   if (usage & PIPE_BIND_SAMPLER_VIEW) {
+   if (target == PIPE_BUFFER) {
+   if (r600_is_vertex_format_supported(format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   } else {
+   if (r600_is_sampler_format_supported(screen, format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   }
}
 
if ((usage & (PIPE_BIND_RENDER_TARGET |
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 36aff00..0d74344 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -1497,9 +1497,14 @@ boolean si_is_format_supported(struct pipe_screen 
*screen,
}
}
 
-   if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
-   si_is_sampler_format_supported(screen, format)) {
-   retval |= PIPE_BIND_SAMPLER_VIEW;
+   if (usage & PIPE_BIND_SAMPLER_VIEW) {
+   if (target == PIPE_BUFFER) {
+   if (si_is_vertex_format_supported(screen, format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   } else {
+   if (si_is_sampler_format_supported(screen, format))
+   retval |= PIPE_BIND_SAMPLER_VIEW;
+   }
}
 
if ((usage & (PIPE_BIND_RENDER_TARGET |
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/1]: Preparing for ARB_viewport_array

2013-10-31 Thread Courtney Goeltzenleuchter

The following patch will begin the process of adding ARB_viewport_array
to Mesa. Next will be to extend the gl_context Scissor and Viewport
attributes to hold multiple viewport, scissor and scissor enables.
Then the DI side of ARB_viewport_array.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] mesa: Change driver interface for ARB_viewport_array

2013-10-31 Thread Courtney Goeltzenleuchter

Add the index parameter to the Scissor, Viewport and
DepthRange driver methods. Update i965 and Gallium
to the change. Index always 0.
---
 src/mesa/drivers/common/driverfuncs.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_context.c |  4 ++--
 src/mesa/drivers/dri/i965/brw_context.h |  2 +-
 src/mesa/drivers/dri/swrast/swrast.c|  3 ++-
 src/mesa/main/dd.h  | 10 +++---
 src/mesa/main/scissor.c |  2 +-
 src/mesa/main/viewport.c|  4 ++--
 src/mesa/state_tracker/st_cb_viewport.c |  3 ++-
 8 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/mesa/drivers/common/driverfuncs.c 
b/src/mesa/drivers/common/driverfuncs.c
index 5faa98a..e45dc0e 100644
--- a/src/mesa/drivers/common/driverfuncs.c
+++ b/src/mesa/drivers/common/driverfuncs.c
@@ -299,7 +299,7 @@ _mesa_init_driver_state(struct gl_context *ctx)
ctx->Driver.LogicOpcode(ctx, ctx->Color.LogicOp);
ctx->Driver.PointSize(ctx, ctx->Point.Size);
ctx->Driver.PolygonStipple(ctx, (const GLubyte *) ctx->PolygonStipple);
-   ctx->Driver.Scissor(ctx, ctx->Scissor.X, ctx->Scissor.Y,
+   ctx->Driver.Scissor(ctx, 0, ctx->Scissor.X, ctx->Scissor.Y,
ctx->Scissor.Width, ctx->Scissor.Height);
ctx->Driver.ShadeModel(ctx, ctx->Light.ShadeModel);
ctx->Driver.StencilFuncSeparate(ctx, GL_FRONT,
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 3880e18..5b4d662d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -125,13 +125,13 @@ intelGetString(struct gl_context * ctx, GLenum name)
 }
 
 static void
-intel_viewport(struct gl_context *ctx, GLint x, GLint y, GLsizei w, GLsizei h)
+intel_viewport(struct gl_context *ctx, GLuint idx, GLint x, GLint y, GLsizei 
w, GLsizei h)
 {
struct brw_context *brw = brw_context(ctx);
__DRIcontext *driContext = brw->driContext;
 
if (brw->saved_viewport)
-  brw->saved_viewport(ctx, x, y, w, h);
+  brw->saved_viewport(ctx, idx, x, y, w, h);
 
if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) {
   dri2InvalidateDrawable(driContext->driDrawablePriv);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3be2138..c261ae8 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1411,7 +1411,7 @@ struct brw_context
 
__DRIcontext *driContext;
struct intel_screen *intelScreen;
-   void (*saved_viewport)(struct gl_context *ctx,
+   void (*saved_viewport)(struct gl_context *ctx, GLuint idx,
   GLint x, GLint y, GLsizei width, GLsizei height);
 };
 
diff --git a/src/mesa/drivers/dri/swrast/swrast.c 
b/src/mesa/drivers/dri/swrast/swrast.c
index bfa2efd..ffb1fa0 100644
--- a/src/mesa/drivers/dri/swrast/swrast.c
+++ b/src/mesa/drivers/dri/swrast/swrast.c
@@ -618,7 +618,8 @@ update_state( struct gl_context *ctx, GLuint new_state )
 }
 
 static void
-viewport(struct gl_context *ctx, GLint x, GLint y, GLsizei w, GLsizei h)
+viewport(struct gl_context *ctx, GLuint idx,
+ GLint x, GLint y, GLsizei w, GLsizei h)
 {
 struct gl_framebuffer *draw = ctx->WinSysDrawBuffer;
 struct gl_framebuffer *read = ctx->WinSysReadBuffer;
diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 5011921..7f57a39 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -479,7 +479,8 @@ struct dd_function_table {
/** Enable or disable writing into the depth buffer */
void (*DepthMask)(struct gl_context *ctx, GLboolean flag);
/** Specify mapping of depth values from NDC to window coordinates */
-   void (*DepthRange)(struct gl_context *ctx, GLclampd nearval, GLclampd 
farval);
+   void (*DepthRange)(struct gl_context *ctx, GLuint idx,
+  GLclampd nearval, GLclampd farval);
/** Specify the current buffer for writing */
void (*DrawBuffer)( struct gl_context *ctx, GLenum buffer );
/** Specify the buffers for writing for fragment programs*/
@@ -519,7 +520,9 @@ struct dd_function_table {
/** Set rasterization mode */
void (*RenderMode)(struct gl_context *ctx, GLenum mode );
/** Define the scissor box */
-   void (*Scissor)(struct gl_context *ctx, GLint x, GLint y, GLsizei w, 
GLsizei h);
+   void (*Scissor)(struct gl_context *ctx, GLuint idx,
+   GLint x, GLint y,
+   GLsizei width, GLsizei height);
/** Select flat or smooth shading */
void (*ShadeModel)(struct gl_context *ctx, GLenum mode);
/** OpenGL 2.0 two-sided StencilFunc */
@@ -541,7 +544,8 @@ struct dd_function_table {
 struct gl_texture_object *texObj,
 GLenum pname, const GLfloat *params);
/** Set the viewport */
-   void (*Viewport)(struct gl_context *ctx, GLint x, GLint y, GLsizei w, 
GLsizei h);
+   void (*Viewport)(struct gl_context *ctx, GLuint idx,
+GLint x, GLint y, GLsizei w, GLsizei

Re: [Mesa-dev] [PATCH] i965: Fix compiler warning.

2013-10-31 Thread Chad Versace


+Ken, because he has fought the dragons of GLboolean and know
where lie the invisible dragons.

On 10/30/2013 02:58 PM, Courtney Goeltzenleuchter wrote:

fix: intel_screen.c:1320:4: warning: initialization from
incompatible pointer type [enabled by default]
---
  src/mesa/drivers/dri/i965/brw_context.c | 2 +-
  src/mesa/drivers/dri/i965/brw_context.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 2a923c4..5b4d662d 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -498,7 +498,7 @@ brw_process_driconf_options(struct brw_context *brw)
driQueryOptionb(options, "disable_glsl_line_continuations");
  }

-bool
+GLboolean
  brwCreateContext(gl_api api,
 const struct gl_config *mesaVis,
 __DRIcontext *driContextPriv,
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 7f7d5c2..c261ae8 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -1446,7 +1446,7 @@ void intel_prepare_render(struct brw_context *brw);
  void intel_resolve_for_dri2_flush(struct brw_context *brw,
__DRIdrawable *drawable);

-bool brwCreateContext(gl_api api,
+GLboolean brwCreateContext(gl_api api,
  const struct gl_config *mesaVis,
  __DRIcontext *driContextPriv,
unsigned major_version,



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Brian Paul

On Thu, Oct 31, 2013 at 9:42 AM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/mesa/main/texcompress.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
> index e71d0c4..7393d3f 100644
> --- a/src/mesa/main/texcompress.c
> +++ b/src/mesa/main/texcompress.c
> @@ -257,11 +257,12 @@ _mesa_get_compressed_formats(struct gl_context *ctx, 
> GLint *formats)
> if (ctx->Extensions.EXT_texture_compression_s3tc) {
>if (formats) {
>   formats[n++] = GL_COMPRESSED_RGB_S3TC_DXT1_EXT;
> + formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT1_EXT;
>   formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT3_EXT;
>   formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT5_EXT;
>}
>else {
> - n += 3;
> + n += 4;
>}
> }
>

Actually, I believe the code is correct as-is.  There's a comment
above the function that says:

 * Some formats are \b not returned by this function.  The
 * \c GL_COMPRESSED_TEXTURE_FORMATS query only returns formats that are
 * "suitable for general-purpose usage."  All texture compression extensions
 * have taken this to mean either linear RGB or linear RGBA.
 *

I'd have to dig up the spec language, but NVIDIA also omits
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT when you do that query.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] docs/GL3: document radeonsi support, minor cleanup

2013-10-31 Thread Ian Romanick

On 10/31/2013 08:42 AM, Marek Olšák wrote:
> From: Marek Olšák 

A couple small comments below.  With those fixed,

Reviewed-by: Ian Romanick 

> ---
>  docs/GL3.txt | 180 
> +--
>  1 file changed, 90 insertions(+), 90 deletions(-)
> 
> diff --git a/docs/GL3.txt b/docs/GL3.txt
> index ff28ea6..0408266 100644
> --- a/docs/GL3.txt
> +++ b/docs/GL3.txt
> @@ -20,45 +20,45 @@ Feature   
> Status
>  
>  GL 3.0:
>  
> -GLSL 1.30 DONE
> +GLSL 1.30 DONE (i965, r600, 
> radeonsi)
>  glBindFragDataLocation, glGetFragDataLocation DONE
> -Conditional rendering (GL_NV_conditional_render)  DONE (i965, r300, 
> r600, swrast)
> -Map buffer subranges (GL_ARB_map_buffer_range)DONE (i965, r300, 
> r600, swrast)
> -Clamping controls (GL_ARB_color_buffer_float) DONE (i965, r300, r600)
> -Float textures, renderbuffers (GL_ARB_texture_float)  DONE (i965, r300, r600)
> -GL_EXT_packed_float   DONE (i965, r600)
> -GL_EXT_texture_shared_exponentDONE (i965, r600, 
> swrast)
> -Float depth buffers (GL_ARB_depth_buffer_float)   DONE (i965, r600)
> -Framebuffer objects (GL_ARB_framebuffer_object)   DONE (i965, r300, 
> r600, swrast)
> -Half-floatDONE
> -Non-normalized Integer texture/framebuffer formatsDONE (i965, r600)
> -1D/2D Texture arrays  DONE
> -Per-buffer blend and masks (GL_EXT_draw_buffers2) DONE (i965, r600, 
> swrast)
> -GL_EXT_texture_compression_rgtc   DONE (i965, r300, 
> r600, swrast)
> -Red and red/green texture formats DONE (i965, swrast, 
> gallium)
> -Transform feedback (GL_EXT_transform_feedback)DONE (i965, r600)
> -Vertex array objects (GL_APPLE_vertex_array_object)   DONE (i965, r300, 
> r600, swrast)
> -sRGB framebuffer format (GL_EXT_framebuffer_sRGB) DONE (i965, r600)
> +Conditional rendering (GL_NV_conditional_render)  DONE (i965, r300, 
> r600, radeonsi, swrast)
> +Map buffer subranges (GL_ARB_map_buffer_range)DONE (i965, r300, 
> r600, radeonsi, swrast)
> +Clamping controls (GL_ARB_color_buffer_float) DONE (i965, r300, 
> r600, radeonsi)
> +Float textures, renderbuffers (GL_ARB_texture_float)  DONE (i965, r300, 
> r600, radeonsi)
> +GL_EXT_packed_float   DONE (i965, r600, 
> radeonsi)
> +GL_EXT_texture_shared_exponentDONE (i965, r600, 
> radeonsi, swrast)
> +Float depth buffers (GL_ARB_depth_buffer_float)   DONE (i965, r600, 
> radeonsi)
> +Framebuffer objects (GL_ARB_framebuffer_object)   DONE (i965, r300, 
> r600, radeonsi, swrast)
> +Half-floatDONE (i965, r300, 
> r600, radeonsi, swrast)
> +Non-normalized Integer texture/framebuffer formatsDONE (i965, r600, 
> radeonsi)
> +1D/2D Texture arrays  DONE (i965, r600, 
> radeonsi)
> +Per-buffer blend and masks (GL_EXT_draw_buffers2) DONE (i965, r600, 
> radeonsi, swrast)
> +GL_EXT_texture_compression_rgtc   DONE (i965, r300, 
> r600, radeonsi, swrast)
> +Red and red/green texture formats DONE (i965, r300, 
> r600, radeonsi, swrast)
> +Transform feedback (GL_EXT_transform_feedback)DONE (i965, r600, 
> radeonsi)
> +Vertex array objects (GL_APPLE_vertex_array_object)   DONE (all drivers)
> +sRGB framebuffer format (GL_EXT_framebuffer_sRGB) DONE (i965, r600, 
> radeonsi)
>  glClearBuffer commandsDONE
>  glGetStringi command  DONE
>  glTexParameterI, glGetTexParameterI commands  DONE
>  glVertexAttribI commands  DONE
> -Depth format cube texturesDONE
> +Depth format cube texturesDONE (i965, r600, 
> radeonsi)
>  GLX_ARB_create_context (GLX 1.4 is required)  DONE
>  
>  
>  GL 3.1:
>  
> -GLSL 1.40 DONE (i965, r600)
> -Forward compatibile context support/deprecations  DONE (i965, r600)
> -Instanced drawing (GL_ARB_draw_instanced) DONE (i965, gallium, 
> swrast)
> -Buffer copying (GL_ARB_copy_buffer)   DONE (i965, r300, 
> r600, swrast)
> -Primitive restart (GL_NV_primitive_restart)   DONE (i965, r600)
> -16 vertex texture image units DONE
> -Texture buffer objs (GL_ARB_texture_buffer_object)DONE for OpenGL 3.1 
> contexts (i965, r600)
> -Rectangular textures (GL_ARB_texture_rectangle)   DONE (i965, r300, 
> r600, swrast)
> -Uniform buffer objs (GL_ARB_uniform_buffer_object)DONE (i965, r600, 
> swrast)
> -Signed normalized textures (GL_EXT_texture_snorm) DONE (i9

Re: [Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Ian Romanick

On 10/31/2013 08:42 AM, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Cc: mesa-sta...@lists.freedesktop.org

This is not correct.  GL_COMPRESSED_TEXTURE_FORMATS is only supposed to
return texture formats "suitable for general-purpose usage."  Since
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT only has 1 bit for alpha, it's not
suitable for general-purpose usage.  As far as I'm aware, nobody else
returns that format either... and I think we have a piglit test.

I just noticed that the most recent version of the spec adds it to the
state tables.  I'll file a spec bug.

> ---
>  src/mesa/main/texcompress.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c
> index e71d0c4..7393d3f 100644
> --- a/src/mesa/main/texcompress.c
> +++ b/src/mesa/main/texcompress.c
> @@ -257,11 +257,12 @@ _mesa_get_compressed_formats(struct gl_context *ctx, 
> GLint *formats)
> if (ctx->Extensions.EXT_texture_compression_s3tc) {
>if (formats) {
>   formats[n++] = GL_COMPRESSED_RGB_S3TC_DXT1_EXT;
> + formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT1_EXT;
>   formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT3_EXT;
>   formats[n++] = GL_COMPRESSED_RGBA_S3TC_DXT5_EXT;
>}
>else {
> - n += 3;
> + n += 4;
>}
> }
>  
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH V2 11/12] i965/gen7: Enable the features required for GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

On Tue, Oct 29, 2013 at 9:29 PM, Paul Berry  wrote:

> On 29 October 2013 21:28, Paul Berry  wrote:
>
>> On 29 October 2013 17:16, Anuj Phogat  wrote:
>>
>>>
>>>
>>>
>>>  On Tue, Oct 29, 2013 at 4:31 PM, Paul Berry wrote:
>>>
>>>
 I think the right thing to do is to add:

 if (dispatch_width == 16)
 fail("...");

 to whatever parts of the visitor you aren't confident will work
 properly in SIMD16.

>>> Yes. But that only covers the per sample shading enabled due to
>>> gl_SampleID and gl_SamplePosition.
>>>
>>> It still doesn't cover the case when there's no  gl_SampleID or
>>>  gl_SamplePosition in fragment shader but GL_SAMPLE_SHADING is enabled.
>>> To disable simd16 for this case I either need to keep the relevant
>>> changes I made in gen7_wm_state.c
>>> or
>>> may be set  simd16_instructions = null in brw_wm_fs_emit() when
>>> GL_SAMPLE_SHADING is enabled. It should also be safe to
>>> call _mesa_get_min_invocations_per_fragment() in brw_wm_fs_emit(). So, we
>>> can take care of all the cases of sample shading enabled at the same place?
>>>
>>
>> Oh, I didn't realize that you were getting hangs with SIMD16 mode even
>> when neither gl_SampleID nor gl_SamplePosition is used.
>>
>> A large part of the reason I had suggested putting calls to fail() in
>> brw_fs_visitor was because I thought the hangs were due to improper code
>> generation.  Since you're getting hangs just by using GL_SAMPLE_SHADING,
>> the cause mustn't be improper code generation, so my idea of failing out of
>> SIMD16 compilation doesn't make sense anymore.  Based on this information I
>> guess I'm ok with the patch as originally written.
>>
>
>
> BTW, on the off chance that it helps you get unstuck with your SIMD16
> hangs, have you noticed that you have to change KSP[0] (kernel start
> pointer 0) in order to get "SIMD16 only" mode to work?  Currently we always
> emit 3DSTATE_PS like this:
>
>BEGIN_BATCH(8);
>OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
>OUT_BATCH(brw->wm.base.prog_offset); /* KSP[0] */
>OUT_BATCH(dw2);
>OUT_BATCH(dw3);
>OUT_BATCH(dw4);
>OUT_BATCH(dw5);
>OUT_BATCH(0); /* KSP[1] */
>OUT_BATCH(brw->wm.base.prog_offset +
> brw->wm.prog_data->prog_offset_16); /* KSP[2] */
>ADVANCE_BATCH();
>
> That's always worked in the past because (see Graphics BSpec:
> 3D-Media-GPGPU Engine > 3D Pipeline Stages > Pixel > Pixel Shader Thread
> Generation > Pixel Grouping (Dispatch Size) Control):
> - In "SIMD8 only" mode, KSP[0] is the SIMD8 program, and KSP[1] and KSP[2]
> are ignored.
> - In "SIMD8+SIMD16" mode, KSP[0] is the SIMD8 program, KSP[1] is ignored,
> and KSP[2] is the SIMD16 program.
>
> However, in "SIMD16 only" mode, KSP[0] needs to be the SIMD16 program.
>
> (my apologies if you've already accounted for this--it just seems like a
> likely explanation for "SIMD16 only" GPU hangs)
>
Last time I tried this didn' work well probably due to other bugs in my
implementation. But it worked this time :). Thanks for making me try it.
Although I've already received r-b on all of my patches, I'll soon send out
V3 of my series with required changes to enable SIMD16 as well.
I'll do a full piglit run to verify the changes.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Marek Olšák

On Thu, Oct 31, 2013 at 6:19 PM, Ian Romanick  wrote:
> On 10/31/2013 08:42 AM, Marek Olšák wrote:
>> From: Marek Olšák 
>>
>> Cc: mesa-sta...@lists.freedesktop.org
>
> This is not correct.  GL_COMPRESSED_TEXTURE_FORMATS is only supposed to
> return texture formats "suitable for general-purpose usage."  Since
> GL_COMPRESSED_RGBA_S3TC_DXT1_EXT only has 1 bit for alpha, it's not
> suitable for general-purpose usage.  As far as I'm aware, nobody else
> returns that format either... and I think we have a piglit test.

Yes, we have a piglit test which expects 4 formats, not 3.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Ian Romanick

On 10/31/2013 10:21 AM, Marek Olšák wrote:
> On Thu, Oct 31, 2013 at 6:19 PM, Ian Romanick  wrote:
>> On 10/31/2013 08:42 AM, Marek Olšák wrote:
>>> From: Marek Olšák 
>>>
>>> Cc: mesa-sta...@lists.freedesktop.org
>>
>> This is not correct.  GL_COMPRESSED_TEXTURE_FORMATS is only supposed to
>> return texture formats "suitable for general-purpose usage."  Since
>> GL_COMPRESSED_RGBA_S3TC_DXT1_EXT only has 1 bit for alpha, it's not
>> suitable for general-purpose usage.  As far as I'm aware, nobody else
>> returns that format either... and I think we have a piglit test.
> 
> Yes, we have a piglit test which expects 4 formats, not 3.

tests/spec/arb_texture_compression/invalid-formats.c expects that
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT is not included in
GL_COMPRESSED_TEXTURE_FORMATS.

It looks like tests/spec/gl-3.1/minmax.c expects
GL_NUM_COMPRESSED_TEXTURE_FORMATS be at least 4, but I have no idea what
it's expecting GL_COMPRESSED_TEXTURE_FORMATS will be.  There are no
general-purpose compression formats part of the spec (only RGTC
formats), so... I think that's also a spec bug, and I've submitted a
Khronos bug for that too.  Ugh. :(

> Marek

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/vec4: Don't overwrite op[1] when doing a UBO load.

2013-10-31 Thread Paul Berry

On 30 October 2013 17:45, Eric Anholt  wrote:

> Prior to the GLSL CSE pass, all of our testing happened to have a freshly
> computed temporary in op[1], from the multiply by 16 to get a byte offset.
> As of CSE you'll get var_refs of a reused value when you've got multiple
> loads from the same offset.
>
> Make a proper temporary for computing our temporary value, to avoid
> shifting the value farther and farther down.  Avoids a regression in
> gs-float-array-variable-index
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 7f2ca95..c5d0679 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -1569,7 +1569,7 @@ vec4_visitor::visit(ir_expression *ir)
>ir_constant *uniform_block = ir->operands[0]->as_constant();
>ir_constant *const_offset_ir = ir->operands[1]->as_constant();
>unsigned const_offset = const_offset_ir ?
> const_offset_ir->value.u[0] : 0;
> -  src_reg offset = op[1];
> +  src_reg offset;
>
>/* Now, load the vector from that offset. */
>assert(ir->type->is_vector() || ir->type->is_scalar());
> @@ -1581,7 +1581,8 @@ vec4_visitor::visit(ir_expression *ir)
>if (const_offset_ir) {
>   offset = src_reg(const_offset / 16);
>} else {
> - emit(SHR(dst_reg(offset), offset, src_reg(4)));
> + offset = src_reg(this, glsl_type::uint_type);
> + emit(SHR(dst_reg(offset), op[1], src_reg(4)));
>}
>
>vec4_instruction *pull =
> --
> 1.8.4.rc3
>

Reviewed-by: Paul Berry 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] collected review for GL_ARB_vertex_attrib_binding

2013-10-31 Thread Eric Anholt

While reviewing the patch series, I got to wondering if my patch 1/5 idea
would clean things up, so instead of suggesting you try writing the patch,
I wrote it, and then just wrote the rest of my review as patches. If you
squash it all into your series, the whole set is:

Reviewed-by: Eric Anholt 

This series of stuff is available at "vab" of my tree.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] corresponding cleanup for patch 11: Optimize rebinding the same VBO

2013-10-31 Thread Eric Anholt

---
 src/mesa/main/varray.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index e7ff8d7..0087096 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -1347,10 +1347,10 @@ void GLAPIENTRY
 _mesa_BindVertexBuffer(GLuint bindingIndex, GLuint buffer, GLintptr offset,
GLsizei stride)
 {
+   GET_CURRENT_CONTEXT(ctx);
+   const struct gl_array_object *arrayObj = ctx->Array.ArrayObj;
struct gl_buffer_object *vbo;
-   struct gl_vertex_buffer_binding *binding;
 
-   GET_CURRENT_CONTEXT(ctx);
ASSERT_OUTSIDE_BEGIN_END(ctx);
 
/* The ARB_vertex_attrib_binding spec says:
@@ -1395,10 +1395,8 @@ _mesa_BindVertexBuffer(GLuint bindingIndex, GLuint 
buffer, GLintptr offset,
   return;
}
 
-   binding = gl_vertex_buffer_binding(ctx, VERT_ATTRIB_GENERIC(bindingIndex));
-
-   if (buffer == binding->BufferObj->Name) {
-  vbo = binding->BufferObj;
+   if (buffer == arrayObj->VertexBinding[bindingIndex].BufferObj->Name) {
+  vbo = arrayObj->VertexBinding[bindingIndex].BufferObj;
} else if (buffer != 0) {
   vbo = _mesa_lookup_bufferobj(ctx, buffer);
 
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/5] mesa: Fix bug about "what if we didn't find the VBO"?

2013-10-31 Thread Eric Anholt

There was some spec text, and what there isn't text for we have obvious
intended behavior from other buffer object bindings.
---
 src/mesa/main/bufferobj.c | 16 
 src/mesa/main/bufferobj.h |  8 +++-
 src/mesa/main/varray.c| 19 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
index 2d57cab..ef5fbce 100644
--- a/src/mesa/main/bufferobj.c
+++ b/src/mesa/main/bufferobj.c
@@ -655,11 +655,11 @@ _mesa_free_buffer_objects( struct gl_context *ctx )
}
 }
 
-static bool
-handle_bind_buffer_gen(struct gl_context *ctx,
-  GLenum target,
-  GLuint buffer,
-  struct gl_buffer_object **buf_handle)
+bool
+_mesa_handle_bind_buffer_gen(struct gl_context *ctx,
+ GLenum target,
+ GLuint buffer,
+ struct gl_buffer_object **buf_handle)
 {
struct gl_buffer_object *buf = *buf_handle;
 
@@ -719,7 +719,7 @@ bind_buffer_object(struct gl_context *ctx, GLenum target, 
GLuint buffer)
else {
   /* non-default buffer object */
   newBufObj = _mesa_lookup_bufferobj(ctx, buffer);
-  if (!handle_bind_buffer_gen(ctx, target, buffer, &newBufObj))
+  if (!_mesa_handle_bind_buffer_gen(ctx, target, buffer, &newBufObj))
  return;
}

@@ -2181,7 +2181,7 @@ _mesa_BindBufferRange(GLenum target, GLuint index,
} else {
   bufObj = _mesa_lookup_bufferobj(ctx, buffer);
}
-   if (!handle_bind_buffer_gen(ctx, target, buffer, &bufObj))
+   if (!_mesa_handle_bind_buffer_gen(ctx, target, buffer, &bufObj))
   return;
 
if (!bufObj) {
@@ -2227,7 +2227,7 @@ _mesa_BindBufferBase(GLenum target, GLuint index, GLuint 
buffer)
} else {
   bufObj = _mesa_lookup_bufferobj(ctx, buffer);
}
-   if (!handle_bind_buffer_gen(ctx, target, buffer, &bufObj))
+   if (!_mesa_handle_bind_buffer_gen(ctx, target, buffer, &bufObj))
   return;
 
if (!bufObj) {
diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h
index 9b582f8c..503223d 100644
--- a/src/mesa/main/bufferobj.h
+++ b/src/mesa/main/bufferobj.h
@@ -28,7 +28,7 @@
 #ifndef BUFFEROBJ_H
 #define BUFFEROBJ_H
 
-
+#include 
 #include "mtypes.h"
 
 
@@ -88,6 +88,12 @@ _mesa_reference_buffer_object(struct gl_context *ctx,
   _mesa_reference_buffer_object_(ctx, ptr, bufObj);
 }
 
+bool
+_mesa_handle_bind_buffer_gen(struct gl_context *ctx,
+ GLenum target,
+ GLuint buffer,
+ struct gl_buffer_object **buf_handle);
+
 extern GLuint
 _mesa_total_buffer_object_memory(struct gl_context *ctx);
 
diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index ed3d047..71d13a7 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -1400,17 +1400,18 @@ _mesa_BindVertexBuffer(GLuint bindingIndex, GLuint 
buffer, GLintptr offset,
} else if (buffer != 0) {
   vbo = _mesa_lookup_bufferobj(ctx, buffer);
 
-  /* The ARB_vertex_attrib_binding spec doesn't specify that an error
-   * should be generated when  doesn't refer to a valid buffer
-   * object, but we assume that this is an oversight.
+  /* From the GL_ARB_vertex_attrib_array spec:
+   *
+   *   "[Core profile only:]
+   *An INVALID_OPERATION error is generated if buffer is not zero or a
+   *name returned from a previous call to GenBuffers, or if such a name
+   *has since been deleted with DeleteBuffers.
+   *
+   * Otherwise, we fall back to the same compat profile behavior as other
+   * object references (automatically gen it).
*/
-  if (!vbo) {
- _mesa_error(ctx, GL_INVALID_OPERATION,
- "glBindVertexBuffer(buffer=%u is not a valid "
- "buffer object)",
- buffer);
+  if (!_mesa_handle_bind_buffer_gen(ctx, GL_ARRAY_BUFFER, buffer, &vbo))
  return;
-  }
} else {
   /* The ARB_vertex_attrib_binding spec says:
*
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] corresponding cleanup for patch 7: getters

2013-10-31 Thread Eric Anholt

---
 src/mesa/main/varray.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index f8837e8..e7ff8d7 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -708,6 +708,7 @@ static GLuint
 get_vertex_array_attrib(struct gl_context *ctx, GLuint index, GLenum pname,
   const char *caller)
 {
+   const struct gl_array_object *arrayObj = ctx->Array.ArrayObj;
const struct gl_vertex_attrib_array *array;
 
if (index >= ctx->Const.VertexProgram.MaxAttribs) {
@@ -715,7 +716,7 @@ get_vertex_array_attrib(struct gl_context *ctx, GLuint 
index, GLenum pname,
   return 0;
}
 
-   array = gl_vertex_attrib_array(ctx, VERT_ATTRIB_GENERIC(index));
+   array = &arrayObj->VertexAttrib[VERT_ATTRIB_GENERIC(index)];
 
switch (pname) {
case GL_VERTEX_ATTRIB_ARRAY_ENABLED_ARB:
@@ -729,7 +730,7 @@ get_vertex_array_attrib(struct gl_context *ctx, GLuint 
index, GLenum pname,
case GL_VERTEX_ATTRIB_ARRAY_NORMALIZED_ARB:
   return array->Normalized;
case GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB:
-  return gl_vertex_buffer_binding(ctx, 
array->VertexBinding)->BufferObj->Name;
+  return arrayObj->VertexBinding[array->VertexBinding].BufferObj->Name;
case GL_VERTEX_ATTRIB_ARRAY_INTEGER:
   if ((_mesa_is_desktop_gl(ctx)
&& (ctx->Version >= 30 || ctx->Extensions.EXT_gpu_shader4))
@@ -740,7 +741,7 @@ get_vertex_array_attrib(struct gl_context *ctx, GLuint 
index, GLenum pname,
case GL_VERTEX_ATTRIB_ARRAY_DIVISOR_ARB:
   if ((_mesa_is_desktop_gl(ctx) && ctx->Extensions.ARB_instanced_arrays)
   || _mesa_is_gles3(ctx)) {
- return gl_vertex_buffer_binding(ctx, 
array->VertexBinding)->InstanceDivisor;
+ return arrayObj->VertexBinding[array->VertexBinding].InstanceDivisor;
   }
case GL_VERTEX_ATTRIB_BINDING:
   if (_mesa_is_desktop_gl(ctx)) {
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/5] Patch 5: whitespace consistency.

2013-10-31 Thread Eric Anholt

---
 src/mesa/main/varray.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index 0087096..ed3d047 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -1495,7 +1495,7 @@ _mesa_VertexAttribIFormat(GLuint attribIndex, GLint size, 
GLenum type,
 * - ..."
 */
if (ctx->API == API_OPENGL_CORE &&
-  ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
+   ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glVertexAttribIFormat(No array object bound)");
   return;
@@ -1543,7 +1543,7 @@ _mesa_VertexAttribLFormat(GLuint attribIndex, GLint size, 
GLenum type,
 * that this is an oversight.
 */
if (ctx->API == API_OPENGL_CORE &&
-  ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
+   ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glVertexAttribLFormat(No array object bound)");
   return;
@@ -1583,7 +1583,7 @@ _mesa_VertexAttribBinding(GLuint attribIndex, GLuint 
bindingIndex)
 * is bound."
 */
if (ctx->API == API_OPENGL_CORE &&
-  ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
+   ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glVertexAttribBinding(No array object bound)");
   return;
@@ -1632,7 +1632,7 @@ _mesa_VertexBindingDivisor(GLuint bindingIndex, GLuint 
divisor)
 * is bound."
 */
if (ctx->API == API_OPENGL_CORE &&
-  ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
+   ctx->Array.ArrayObj == ctx->Array.DefaultArrayObj) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glVertexBindingDivisor(No array object bound)");
   return;
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] mesa: Clean up vertex attrib binding patch.

2013-10-31 Thread Eric Anholt

I didn't like how the getters looked in the code (I have a hard enough
time keeping the ArrayObj members straight without moving some of their
names farther from their usage), so I tried this cleanup.  It is intended
to be squashed with "[PATCH 05/11] mesa: Add ARB_vertex_attrib_binding"
---
 src/mesa/main/varray.c | 50 +++---
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c
index 93581ea..f8837e8 100644
--- a/src/mesa/main/varray.c
+++ b/src/mesa/main/varray.c
@@ -101,23 +101,6 @@ type_to_bit(const struct gl_context *ctx, GLenum type)
}
 }
 
-
-static inline struct gl_vertex_attrib_array *
-gl_vertex_attrib_array(struct gl_context *ctx, GLuint index)
-{
-   ASSERT(index < Elements(arrayObj->VertexAttrib));
-   return &ctx->Array.ArrayObj->VertexAttrib[index];
-}
-
-
-static inline struct gl_vertex_buffer_binding *
-gl_vertex_buffer_binding(struct gl_context *ctx, GLuint index)
-{
-   ASSERT(index < Elements(arrayObj->VertexBinding));
-   return &ctx->Array.ArrayObj->VertexBinding[index];
-}
-
-
 /**
  * Sets the VertexBinding field in the vertex attribute given by attribIndex.
  */
@@ -125,24 +108,20 @@ static void
 vertex_attrib_binding(struct gl_context *ctx, GLuint attribIndex,
   GLuint bindingIndex)
 {
-   struct gl_vertex_attrib_array *array =
-   gl_vertex_attrib_array(ctx, attribIndex);
+   struct gl_array_object *arrayObj = ctx->Array.ArrayObj;
+   struct gl_vertex_attrib_array *array = &arrayObj->VertexAttrib[attribIndex];
 
if (array->VertexBinding != bindingIndex) {
   const GLbitfield64 array_bit = VERT_BIT(attribIndex);
-  struct gl_vertex_buffer_binding *new_binding =
-  gl_vertex_buffer_binding(ctx, bindingIndex);
-  struct gl_vertex_buffer_binding *old_binding =
-  gl_vertex_buffer_binding(ctx, array->VertexBinding);
 
   FLUSH_VERTICES(ctx, _NEW_ARRAY);
 
-  array->VertexBinding = bindingIndex;
+  arrayObj->VertexBinding[array->VertexBinding]._BoundArrays &= ~array_bit;
+  arrayObj->VertexBinding[bindingIndex]._BoundArrays |= array_bit;
 
-  old_binding->_BoundArrays &= ~array_bit;
-  new_binding->_BoundArrays |= array_bit;
+  array->VertexBinding = bindingIndex;
 
-  ctx->Array.ArrayObj->NewArrays |= array_bit;
+  arrayObj->NewArrays |= array_bit;
}
 }
 
@@ -156,8 +135,8 @@ bind_vertex_buffer(struct gl_context *ctx, GLuint index,
struct gl_buffer_object *vbo,
GLintptr offset, GLsizei stride)
 {
-   struct gl_vertex_buffer_binding *binding =
-   gl_vertex_buffer_binding(ctx, index);
+   struct gl_array_object *arrayObj = ctx->Array.ArrayObj;
+   struct gl_vertex_buffer_binding *binding = &arrayObj->VertexBinding[index];
 
if (binding->BufferObj != vbo ||
binding->Offset != offset ||
@@ -170,7 +149,7 @@ bind_vertex_buffer(struct gl_context *ctx, GLuint index,
   binding->Offset = offset;
   binding->Stride = stride;
 
-  ctx->Array.ArrayObj->NewArrays |= binding->_BoundArrays;
+  arrayObj->NewArrays |= binding->_BoundArrays;
}
 }
 
@@ -183,13 +162,14 @@ static void
 vertex_binding_divisor(struct gl_context *ctx, GLuint bindingIndex,
GLuint divisor)
 {
+   struct gl_array_object *arrayObj = ctx->Array.ArrayObj;
struct gl_vertex_buffer_binding *binding =
-   gl_vertex_buffer_binding(ctx, bindingIndex);
+  &arrayObj->VertexBinding[bindingIndex];
 
if (binding->InstanceDivisor != divisor) {
   FLUSH_VERTICES(ctx, _NEW_ARRAY);
   binding->InstanceDivisor = divisor;
-  ctx->Array.ArrayObj->NewArrays |= binding->_BoundArrays;
+  arrayObj->NewArrays |= binding->_BoundArrays;
}
 }
 
@@ -336,7 +316,7 @@ update_array_format(struct gl_context *ctx,
elementSize = _mesa_bytes_per_vertex_attrib(size, type);
assert(elementSize != -1);
 
-   array = gl_vertex_attrib_array(ctx, attrib);
+   array = &ctx->Array.ArrayObj->VertexAttrib[attrib];
array->Size = size;
array->Type = type;
array->Format = format;
@@ -429,7 +409,7 @@ update_array(struct gl_context *ctx,
vertex_attrib_binding(ctx, attrib, attrib);
 
/* The Stride and Ptr fields are not set by update_array_format() */
-   array = gl_vertex_attrib_array(ctx, attrib);
+   array = &ctx->Array.ArrayObj->VertexAttrib[attrib];
array->Stride = stride;
array->Ptr = (const GLvoid *) ptr;
 
@@ -1404,7 +1384,7 @@ _mesa_BindVertexBuffer(GLuint bindingIndex, GLuint 
buffer, GLintptr offset,
 */
if (offset < 0) {
   _mesa_error(ctx, GL_INVALID_VALUE,
-  "glBindVertexBuffer(offset=%d < 0)", offset);
+  "glBindVertexBuffer(offset=%lld < 0)", (long long)offset);
   return;
}
 
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.f

Re: [Mesa-dev] [PATCH 6/6] mesa: fix NUM_COMPRESSED_TEXTURE_FORMATS query

2013-10-31 Thread Ian Romanick

On 10/31/2013 10:45 AM, Ian Romanick wrote:
> On 10/31/2013 10:21 AM, Marek Olšák wrote:
>> On Thu, Oct 31, 2013 at 6:19 PM, Ian Romanick  wrote:
>>> On 10/31/2013 08:42 AM, Marek Olšák wrote:
 From: Marek Olšák 

 Cc: mesa-sta...@lists.freedesktop.org
>>>
>>> This is not correct.  GL_COMPRESSED_TEXTURE_FORMATS is only supposed to
>>> return texture formats "suitable for general-purpose usage."  Since
>>> GL_COMPRESSED_RGBA_S3TC_DXT1_EXT only has 1 bit for alpha, it's not
>>> suitable for general-purpose usage.  As far as I'm aware, nobody else
>>> returns that format either... and I think we have a piglit test.
>>
>> Yes, we have a piglit test which expects 4 formats, not 3.
> 
> tests/spec/arb_texture_compression/invalid-formats.c expects that
> GL_COMPRESSED_RGBA_S3TC_DXT1_EXT is not included in
> GL_COMPRESSED_TEXTURE_FORMATS.
> 
> It looks like tests/spec/gl-3.1/minmax.c expects
> GL_NUM_COMPRESSED_TEXTURE_FORMATS be at least 4, but I have no idea what
> it's expecting GL_COMPRESSED_TEXTURE_FORMATS will be.  There are no
> general-purpose compression formats part of the spec (only RGTC
> formats), so... I think that's also a spec bug, and I've submitted a
> Khronos bug for that too.  Ugh. :(

Okay... there already was a bug, and it was fixed in OpenGL 4.2.  We
should change those minmax tests to only expect 0 for
GL_NUM_COMPRESSED_TEXTURE_FORMATS.

>> Marek
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 5/6] docs/GL3: document radeonsi support, minor cleanup

2013-10-31 Thread Matt Turner

On Thu, Oct 31, 2013 at 10:05 AM, Ian Romanick  wrote:
>> +GL_ARB_shading_language_420pack  DONE (i965, r600, 
>> radeonsi)
>
> Because of the way it was implemented, I think this should say "All
> drivers that support GLSL 1.30."

In a similar vein, I wonder if there's any value at all to listing
i915, r300, and swrast in this file (the drivers that will never do
GL3). At least for me, the usefulness of the file has always been to
see how far off some capable hardware's driver is from supporting a GL
version.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 00/12] Implement GL_ARB_sample_shading on Intel hardware

2013-10-31 Thread Anuj Phogat

Changes in V3: Incorporate review comments from Brian, Ian and Paul. Major
changes are related to fixing code generation for simd16 and enabling simd16
dispatch on gen6, gen7.

Although I've received r-b on all the patches (except patch 6/12) in this
series, I'm sending out the final version of patches before I push them
upstream.

I verified the implementation with a number of piglit tests, currently under
review on piglit mailing list. Observed no piglit, gles3 CTS regressions with
these patches on SNB, IVB & HSW.
These patches can also be found at my github branch:
https://github.com/aphogat/mesa.git branch: sample-shading-11

Anuj Phogat (12):
  mesa: Add infrastructure for GL_ARB_sample_shading
  mesa: Add new functions and enums required by GL_ARB_sample_shading
  mesa: Pass number of samples as a program state variable
  glsl: Add new builtins required by GL_ARB_sample_shading
  mesa: Add a helper function _mesa_get_min_invocations_per_fragment()
  i965: Don't do vector splitting for ir_var_system_value
  i965: Add FS backend for builtin gl_SamplePosition
  i965: Add FS backend for builtin gl_SampleID
  i965: Add FS backend for builtin gl_SampleMask[]
  i965/gen6: Enable the features required for GL_ARB_sample_shading
  i965/gen7: Enable the features required for GL_ARB_sample_shading
  i965: Enable ARB_sample_shading on intel hardware >= gen6

 src/glsl/builtin_variables.cpp |  18 +++
 src/glsl/glcpp/glcpp-parse.y   |   3 +
 src/glsl/glsl_parser_extras.cpp|   1 +
 src/glsl/glsl_parser_extras.h  |   2 +
 src/glsl/standalone_scaffolding.cpp|   1 +
 src/mapi/glapi/gen/ARB_sample_shading.xml  |  19 +++
 src/mapi/glapi/gen/GL4x.xml|  21 
 src/mapi/glapi/gen/Makefile.am |   5 +-
 src/mapi/glapi/gen/gl_API.xml  |   4 +-
 src/mesa/drivers/dri/i965/brw_context.h|   2 +
 src/mesa/drivers/dri/i965/brw_defines.h|   2 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 127 +
 src/mesa/drivers/dri/i965/brw_fs.h |  14 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  71 
 .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |   1 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  19 +++
 src/mesa/drivers/dri/i965/brw_wm.c |  12 ++
 src/mesa/drivers/dri/i965/brw_wm.h |   3 +
 src/mesa/drivers/dri/i965/gen6_wm_state.c  |  61 +-
 src/mesa/drivers/dri/i965/gen7_wm_state.c  |  61 +-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   1 +
 src/mesa/main/enable.c |  18 +++
 src/mesa/main/extensions.c |   1 +
 src/mesa/main/get.c|   8 ++
 src/mesa/main/get_hash_params.py   |   3 +
 src/mesa/main/mtypes.h |  13 ++-
 src/mesa/main/multisample.c|  18 +++
 src/mesa/main/multisample.h|   2 +
 src/mesa/main/tests/dispatch_sanity.cpp|   4 +-
 src/mesa/program/prog_print.c  |   1 +
 src/mesa/program/prog_statevars.c  |  11 ++
 src/mesa/program/prog_statevars.h  |   2 +
 src/mesa/program/program.c |  32 ++
 src/mesa/program/program.h |   3 +
 34 files changed, 549 insertions(+), 15 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml
 create mode 100644 src/mapi/glapi/gen/GL4x.xml

-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 01/12] mesa: Add infrastructure for GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

This patch implements the common support code required for the
GL_ARB_sample_shading extension.

V2: Move GL_ARB_sample_shading to ARB extension list.

Signed-off-by: Anuj Phogat 
Reviewed-by: Ian Romanick 
Reviewed-by: Ken Graunke 
---
 src/glsl/glcpp/glcpp-parse.y| 3 +++
 src/glsl/glsl_parser_extras.cpp | 1 +
 src/glsl/glsl_parser_extras.h   | 2 ++
 src/glsl/standalone_scaffolding.cpp | 1 +
 src/mesa/main/extensions.c  | 1 +
 src/mesa/main/mtypes.h  | 1 +
 6 files changed, 9 insertions(+)

diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index 86f3cd5..7edc274 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -1249,6 +1249,9 @@ glcpp_parser_create (const struct gl_extensions 
*extensions, int api)
  if (extensions->ARB_shading_language_420pack)
 add_builtin_define(parser, "GL_ARB_shading_language_420pack", 
1);
 
+ if (extensions->ARB_sample_shading)
+add_builtin_define(parser, "GL_ARB_sample_shading", 1);
+
  if (extensions->EXT_shader_integer_mix)
 add_builtin_define(parser, "GL_EXT_shader_integer_mix", 1);
 
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 77e8816..b9d3ba1 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -540,6 +540,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(EXT_shader_integer_mix, true,  true,  
EXT_shader_integer_mix),
EXT(ARB_texture_gather, true,  false, ARB_texture_gather),
EXT(ARB_shader_atomic_counters, true,  false, 
ARB_shader_atomic_counters),
+   EXT(ARB_sample_shading, true,  false, ARB_sample_shading),
 };
 
 #undef EXT
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index f22dac3..345d7a0 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -350,6 +350,8 @@ struct _mesa_glsl_parse_state {
bool AMD_vertex_shader_layer_warn;
bool ARB_shading_language_420pack_enable;
bool ARB_shading_language_420pack_warn;
+   bool ARB_sample_shading_enable;
+   bool ARB_sample_shading_warn;
bool EXT_shader_integer_mix_enable;
bool EXT_shader_integer_mix_warn;
bool ARB_shader_atomic_counters_enable;
diff --git a/src/glsl/standalone_scaffolding.cpp 
b/src/glsl/standalone_scaffolding.cpp
index 7a1cf68..cbff6d1 100644
--- a/src/glsl/standalone_scaffolding.cpp
+++ b/src/glsl/standalone_scaffolding.cpp
@@ -97,6 +97,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, 
gl_api api)
ctx->Extensions.ARB_explicit_attrib_location = true;
ctx->Extensions.ARB_fragment_coord_conventions = true;
ctx->Extensions.ARB_gpu_shader5 = true;
+   ctx->Extensions.ARB_sample_shading = true;
ctx->Extensions.ARB_shader_bit_encoding = true;
ctx->Extensions.ARB_shader_stencil_export = true;
ctx->Extensions.ARB_shader_texture_lod = true;
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 285ec37..48c4e9f 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -118,6 +118,7 @@ static const struct extension extension_table[] = {
{ "GL_ARB_point_sprite",o(ARB_point_sprite),
GL, 2003 },
{ "GL_ARB_provoking_vertex",o(EXT_provoking_vertex),
GL, 2009 },
{ "GL_ARB_robustness",  o(dummy_true),  
GL, 2010 },
+   { "GL_ARB_sample_shading",  o(ARB_sample_shading),  
GL, 2009 },
{ "GL_ARB_sampler_objects", o(dummy_true),  
GL, 2009 },
{ "GL_ARB_seamless_cube_map",   o(ARB_seamless_cube_map),   
GL, 2009 },
{ "GL_ARB_shader_atomic_counters",  
o(ARB_shader_atomic_counters),  GL, 2011 },
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 087bc37..476888b 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3245,6 +3245,7 @@ struct gl_extensions
GLboolean ARB_occlusion_query;
GLboolean ARB_occlusion_query2;
GLboolean ARB_point_sprite;
+   GLboolean ARB_sample_shading;
GLboolean ARB_seamless_cube_map;
GLboolean ARB_shader_atomic_counters;
GLboolean ARB_shader_bit_encoding;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 07/12] i965: Add FS backend for builtin gl_SamplePosition

2013-10-31 Thread Anuj Phogat

V2:
   - Update comments.
   - Add compute_pos_offset variable in brw_wm_prog_key.
   - Add variable uses_pos_offset in brw_wm_prog_data.

V3:
   - Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp | 79 
 src/mesa/drivers/dri/i965/brw_fs.h   |  2 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  5 ++
 src/mesa/drivers/dri/i965/brw_wm.c   |  6 +++
 src/mesa/drivers/dri/i965/brw_wm.h   |  2 +
 6 files changed, 95 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index d30c963..665807c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -383,6 +383,7 @@ struct brw_wm_prog_data {
GLuint nr_params;   /**< number of float params/constants */
GLuint nr_pull_params;
bool dual_src_blend;
+   bool uses_pos_offset;
uint32_t prog_offset_16;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 236e86c..d2380c8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1145,6 +1145,78 @@ fs_visitor::emit_frontfacing_interpolation(ir_variable 
*ir)
return reg;
 }
 
+void
+fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos)
+{
+   assert(dst.type == BRW_REGISTER_TYPE_F);
+
+   if (c->key.compute_pos_offset) {
+  /* Convert int_sample_pos to floating point */
+  emit(MOV(dst, int_sample_pos));
+  /* Scale to the range [0, 1] */
+  emit(MUL(dst, dst, fs_reg(1 / 16.0f)));
+   }
+   else {
+  /* From ARB_sample_shading specification:
+   * "When rendering to a non-multisample buffer, or if multisample
+   *  rasterization is disabled, gl_SamplePosition will always be
+   *  (0.5, 0.5).
+   */
+  emit(MOV(dst, fs_reg(0.5f)));
+   }
+}
+
+fs_reg *
+fs_visitor::emit_samplepos_setup(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+   assert(ir->type == glsl_type::vec2_type);
+
+   this->current_annotation = "compute sample position";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   fs_reg pos = *reg;
+   fs_reg int_sample_x = fs_reg(this, glsl_type::int_type);
+   fs_reg int_sample_y = fs_reg(this, glsl_type::int_type);
+
+   /* WM will be run in MSDISPMODE_PERSAMPLE. So, only one of SIMD8 or SIMD16
+* mode will be enabled.
+*
+* From the Ivy Bridge PRM, volume 2 part 1, page 344:
+* R31.1:0 Position Offset X/Y for Slot[3:0]
+* R31.3:2 Position Offset X/Y for Slot[7:4]
+* .
+*
+* The X, Y sample positions come in as bytes in  thread payload. So, read
+* the positions using vstride=16, width=8, hstride=2.
+*/
+   struct brw_reg sample_pos_reg =
+  stride(retype(brw_vec1_grf(c->sample_pos_reg, 0),
+BRW_REGISTER_TYPE_B), 16, 8, 2);
+
+   emit(MOV(int_sample_x, fs_reg(sample_pos_reg)));
+   if (dispatch_width == 16) {
+  int_sample_x.sechalf = true;
+  fs_inst *inst = emit(MOV(int_sample_x,
+   fs_reg(suboffset(sample_pos_reg, 16;
+  inst->force_sechalf = true;
+  int_sample_x.sechalf = false;
+   }
+   /* Compute gl_SamplePosition.x */
+   compute_sample_position(pos, int_sample_x);
+   pos.reg_offset++;
+   emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1;
+   if (dispatch_width == 16) {
+  int_sample_y.sechalf = true;
+  fs_inst *inst = emit(MOV(int_sample_y,
+   fs_reg(suboffset(sample_pos_reg, 17;
+  inst->force_sechalf = true;
+  int_sample_y.sechalf = false;
+   }
+   /* Compute gl_SamplePosition.y */
+   compute_sample_position(pos, int_sample_y);
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
@@ -3012,7 +3084,14 @@ fs_visitor::setup_payload_gen6()
  c->nr_payload_regs++;
   }
}
+
+   c->prog_data.uses_pos_offset = c->key.compute_pos_offset;
/* R31: MSAA position offsets. */
+   if (c->prog_data.uses_pos_offset) {
+  c->sample_pos_reg = c->nr_payload_regs;
+  c->nr_payload_regs++;
+   }
+
/* R32-: bary for 32-pixel. */
/* R58-59: interp W for 32-pixel. */
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 43e4761..7eb82fc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -340,9 +340,11 @@ public:
  glsl_interp_qualifier interpolation_mode,
  bool is_centroid);
fs_reg *emit_frontfacing_interpolation(ir_variable *ir);
+   fs_reg *emit_samplepos_setup(ir_variable *ir);
fs_reg *emit_general_interpolation(ir_variable *ir);
void emit_interpolation_setup_gen4();
void emit_interpolation_setup_gen6();
+   void compute_sample_position(fs_reg dst, fs_r

[Mesa-dev] [PATCH V3 08/12] i965: Add FS backend for builtin gl_SampleID

2013-10-31 Thread Anuj Phogat

V2:
   - Update comments
   - Add compute_sample_id variables in brw_wm_prog_key
   - Add a special backend instruction to compute sample_id.

V3:
   - Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 48 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  7 
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 29 
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_wm.c |  6 
 src/mesa/drivers/dri/i965/brw_wm.h |  1 +
 7 files changed, 94 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index bad6d40..e03ac58 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -794,6 +794,7 @@ enum opcode {
FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
FS_OPCODE_DISCARD_JUMP,
+   FS_OPCODE_SET_SAMPLE_ID,
FS_OPCODE_SET_SIMD4X2_OFFSET,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d2380c8..625adfc 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1217,6 +1217,54 @@ fs_visitor::emit_samplepos_setup(ir_variable *ir)
return reg;
 }
 
+fs_reg *
+fs_visitor::emit_sampleid_setup(ir_variable *ir)
+{
+   assert(brw->gen >= 6);
+
+   this->current_annotation = "compute sample id";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+
+   if (c->key.compute_sample_id) {
+  fs_reg t1 = fs_reg(this, glsl_type::int_type);
+  fs_reg t2 = fs_reg(this, glsl_type::int_type);
+  t2.type = BRW_REGISTER_TYPE_UW;
+
+  /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
+   * 8x multisampling, subspan 0 will represent sample N (where N
+   * is 0, 2, 4 or 6), subspan 1 will represent sample 1, 3, 5 or
+   * 7. We can find the value of N by looking at R0.0 bits 7:6
+   * ("Starting Sample Pair Index (SSPI)") and multiplying by two
+   * (since samples are always delivered in pairs). That is, we
+   * compute 2*((R0.0 & 0xc0) >> 6) == (R0.0 & 0xc0) >> 5. Then
+   * we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1) in
+   * case of SIMD8 and sequence (0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2,
+   * 2, 3, 3, 3, 3) in case of SIMD16. We compute this sequence by
+   * populating a temporary variable with the sequence (0, 1, 2, 3),
+   * and then reading from it using vstride=1, width=4, hstride=0.
+   * These computations hold good for 4x multisampling as well.
+   */
+  emit(BRW_OPCODE_AND, t1,
+   fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+   fs_reg(brw_imm_d(0xc0)));
+  emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5));
+  /* This works for both SIMD8 and SIMD16 */
+  emit(MOV(t2, brw_imm_v(0x3210)));
+  /* This special instruction takes care of setting vstride=1,
+   * width=4, hstride=0 of t2 during an ADD instruction.
+   */
+  emit(FS_OPCODE_SET_SAMPLE_ID, *reg, t1, t2);
+   } else {
+  /* As per GL_ARB_sample_shading specification:
+   * "When rendering to a non-multisample buffer, or if multisample
+   *  rasterization is disabled, gl_SampleID will always be zero."
+   */
+  emit(BRW_OPCODE_MOV, *reg, fs_reg(0));
+   }
+
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 7eb82fc..0ed8c05 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -341,6 +341,7 @@ public:
  bool is_centroid);
fs_reg *emit_frontfacing_interpolation(ir_variable *ir);
fs_reg *emit_samplepos_setup(ir_variable *ir);
+   fs_reg *emit_sampleid_setup(ir_variable *ir);
fs_reg *emit_general_interpolation(ir_variable *ir);
void emit_interpolation_setup_gen4();
void emit_interpolation_setup_gen6();
@@ -547,6 +548,12 @@ private:
  struct brw_reg index,
  struct brw_reg offset);
void generate_mov_dispatch_to_flags(fs_inst *inst);
+
+   void generate_set_sample_id(fs_inst *inst,
+   struct brw_reg dst,
+   struct brw_reg src0,
+   struct brw_reg src1);
+
void generate_set_simd4x2_offset(fs_inst *inst,
 struct brw_reg dst,
 struct brw_reg offset);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 672e896..2553098 100644
--- a/src/m

[Mesa-dev] [PATCH V3 11/12] i965/gen7: Enable the features required for GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE,
  GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN7_WM_8_DISPATCH_ENABLE or GEN7_WM_16_DISPATCH_ENABLE
  when GEN7_WM_MSDISPMODE_PERSAMPLE is enabled. Refer IVB PRM Vol. 2, Part 1,
  Page 288 for details.

V2:
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.

V3:
- Enable simd16 dispatch with per sample shading.
- Make changes to give preference to 'simd16 only' mode over
  'simd8 only' mode in case of non 1x per sample shading.

Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 61 ---
 1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index a2046c3..58a6438 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -27,6 +27,7 @@
 #include "brw_defines.h"
 #include "brw_util.h"
 #include "brw_wm.h"
+#include "program/program.h"
 #include "program/prog_parameter.h"
 #include "program/prog_statevars.h"
 #include "intel_batchbuffer.h"
@@ -82,9 +83,13 @@ upload_wm_state(struct brw_context *brw)
   GEN7_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT;
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
+   /* Enable if the pixel shader kernel generates and outputs oMask.
+*/
if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
-   ctx->Multisample.SampleAlphaToCoverage)
+   ctx->Multisample.SampleAlphaToCoverage ||
+   brw->wm.prog_data->uses_omask) {
   dw1 |= GEN7_WM_KILL_ENABLE;
+   }
 
/* _NEW_BUFFERS */
if (brw_color_buffer_write_enabled(brw) || writes_depth ||
@@ -97,7 +102,11 @@ upload_wm_state(struct brw_context *brw)
  dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
   else
  dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
-  dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
+
+  if (_mesa_get_min_invocations_per_fragment(ctx, brw->fragment_program) > 
1)
+ dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
} else {
   dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
   dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
@@ -169,6 +178,32 @@ upload_ps_state(struct brw_context *brw)
if (brw->wm.prog_data->nr_params > 0)
   dw4 |= GEN7_PS_PUSH_CONSTANT_ENABLE;
 
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* "This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples."
+*/
+   if (brw->wm.prog_data->uses_omask)
+  dw4 |= GEN7_PS_OMASK_TO_RENDER_TARGET;
+
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* "If the PS kernel does not need the Position XY Offsets to
+* compute a Position Value, then this field should be programmed
+* to POSOFFSET_NONE."
+* "SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation."
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw->wm.prog_data->uses_pos_offset)
+  dw4 |= GEN7_PS_POSOFFSET_SAMPLE;
+   else
+  dw4 |= GEN7_PS_POSOFFSET_NONE;
+
/* CACHE_NEW_WM_PROG | _NEW_COLOR
 *
 * The hardware wedges if you have this bit set but don't turn on any dual
@@ -184,9 +219,22 @@ upload_ps_state(struct brw_context *brw)
if (brw->wm.prog_data->num_varying_inputs != 0)
   dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
 
-   dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
-   if (brw->wm.prog_data->prog_offset_16)
+   /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16
+* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader
+* is successfully compiled. In majority of the cases that bring us
+* better performance than 'SIMD8 only' dispatch.
+*/
+   int min_inv_per_frag =
+  _mesa_get_min_invocations_per_fragment(ctx, brw->fragment_program);
+   assert(min_inv_per_frag >= 1);
+
+   if (brw->wm.prog_data->prog_offset_16) {
   dw4 |= GEN7_PS_16_DISPATCH_ENABLE;
+  if (min_inv_per_frag == 1)
+ dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
+   }
+   else
+  dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
 
dw5 |= (brw->wm.prog_data->first_curbe_grf <<
   GEN7_PS_DISPATCH_START_GRF_SHIFT_0);
@@ -195,7 +243,10 @@ upload_ps_state(struct brw_context *brw)
 
BEGIN_BATCH(8);
OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
-   OUT_BATCH(brw->wm.base.prog_offset);
+   if (brw->wm.prog_data->prog_offset_16 && min_inv_per_fra

[Mesa-dev] [PATCH V3 12/12] i965: Enable ARB_sample_shading on intel hardware >= gen6

2013-10-31 Thread Anuj Phogat

Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
Reviewed-by: Ken Graunke 
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index abc3176..5eae84f 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -209,6 +209,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.OES_depth_texture_cube_map = true;
   ctx->Extensions.ARB_shading_language_packing = true;
   ctx->Extensions.ARB_texture_multisample = true;
+  ctx->Extensions.ARB_sample_shading = true;
 
   /* Test if the kernel has the ioctl. */
   if (drm_intel_reg_read(brw->bufmgr, TIMESTAMP, &dummy) == 0)
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 06/12] i965: Don't do vector splitting for ir_var_system_value

2013-10-31 Thread Anuj Phogat

This is required while adding builtin system value vec{2, 3, 4}
variables. For example:
(declare (sys) vec2 gl_SamplePosition)

Without this patch above glsl ir splits in to:
(declare (temporary) float gl_SamplePosition_x)
(declare (temporary) float gl_SamplePosition_y)

Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
index eb7851b..6284b59 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
@@ -111,6 +111,7 @@ ir_vector_reference_visitor::get_variable_entry(ir_variable 
*var)
case ir_var_uniform:
case ir_var_shader_in:
case ir_var_shader_out:
+   case ir_var_system_value:
case ir_var_function_in:
case ir_var_function_out:
case ir_var_function_inout:
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 05/12] mesa: Add a helper function _mesa_get_min_invocations_per_fragment()

2013-10-31 Thread Anuj Phogat

This function is used to test if we need to do per sample shading or
per fragment shading.

V2: Use MAX2() to make sure the function returns a number >= 1.

Signed-off-by: Anuj Phogat 
Reviewed-by: Ian Romanick 
Reviewed-by: Paul Berry 
---
 src/mesa/program/program.c | 32 
 src/mesa/program/program.h |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
index 093d372..a102ec1 100644
--- a/src/mesa/program/program.c
+++ b/src/mesa/program/program.c
@@ -32,6 +32,7 @@
 #include "main/glheader.h"
 #include "main/context.h"
 #include "main/hash.h"
+#include "main/macros.h"
 #include "program.h"
 #include "prog_cache.h"
 #include "prog_parameter.h"
@@ -1024,3 +1025,34 @@ _mesa_postprocess_program(struct gl_context *ctx, struct 
gl_program *prog)
 
}
 }
+
+/* Gets the minimum number of shader invocations per fragment.
+ * This function is useful to determine if we need to do per
+ * sample shading or per fragment shading.
+ */
+GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+   const struct gl_fragment_program *prog)
+{
+   /* From ARB_sample_shading specification:
+* "Using gl_SampleID in a fragment shader causes the entire shader
+*  to be evaluated per-sample."
+*
+* "Using gl_SamplePosition in a fragment shader causes the entire
+*  shader to be evaluated per-sample."
+*
+* "If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading
+*  has no effect."
+*/
+   if (ctx->Multisample.Enabled) {
+  if (prog->Base.SystemValuesRead & (SYSTEM_BIT_SAMPLE_ID |
+ SYSTEM_BIT_SAMPLE_POS))
+ return MAX2(ctx->DrawBuffer->Visual.samples, 1);
+  else if (ctx->Multisample.SampleShading)
+ return MAX2(ceil(ctx->Multisample.MinSampleShadingValue *
+  ctx->DrawBuffer->Visual.samples), 1);
+  else
+ return 1;
+   }
+   return 1;
+}
diff --git a/src/mesa/program/program.h b/src/mesa/program/program.h
index 34965ab..353ccab 100644
--- a/src/mesa/program/program.h
+++ b/src/mesa/program/program.h
@@ -187,6 +187,9 @@ _mesa_valid_register_index(const struct gl_context *ctx,
 extern void
 _mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog);
 
+extern GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+   const struct gl_fragment_program *prog);
 
 static inline GLuint
 _mesa_program_target_to_index(GLenum v)
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
  GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE
  when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
  Refer SNB PRM Vol. 2, Part 1, Page 279 for details.

V2:
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.

V3:
- Enable simd16 dispatch with per sample shading.
- Make changes to give preference to 'simd16 only' mode over
  'simd8 only' mode in case of non 1x per sample shading.

Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 61 ---
 1 file changed, 56 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index e3395ce..42d8789 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -30,6 +30,7 @@
 #include "brw_defines.h"
 #include "brw_util.h"
 #include "brw_wm.h"
+#include "program/program.h"
 #include "program/prog_parameter.h"
 #include "program/prog_statevars.h"
 #include "intel_batchbuffer.h"
@@ -153,9 +154,23 @@ upload_wm_state(struct brw_context *brw)
dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;
 
/* CACHE_NEW_WM_PROG */
-   dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
-   if (brw->wm.prog_data->prog_offset_16)
+
+   /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16
+* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader
+* is successfully compiled. In majority of the cases that bring us
+* better performance than 'SIMD8 only' dispatch.
+*/
+   int min_inv_per_frag =
+  _mesa_get_min_invocations_per_fragment(ctx, brw->fragment_program);
+   assert(min_inv_per_frag >= 1);
+
+   if (brw->wm.prog_data->prog_offset_16) {
   dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
+  if (min_inv_per_frag == 1)
+ dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
+   }
+   else
+  dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
 
/* CACHE_NEW_WM_PROG | _NEW_COLOR */
if (brw->wm.prog_data->dual_src_blend &&
@@ -183,7 +198,8 @@ upload_wm_state(struct brw_context *brw)
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
-   ctx->Multisample.SampleAlphaToCoverage)
+   ctx->Multisample.SampleAlphaToCoverage ||
+   brw->wm.prog_data->uses_omask)
   dw5 |= GEN6_WM_KILL_ENABLE;
 
if (brw_color_buffer_write_enabled(brw) ||
@@ -191,6 +207,16 @@ upload_wm_state(struct brw_context *brw)
   dw5 |= GEN6_WM_DISPATCH_ENABLE;
}
 
+   /* From the SNB PRM, volume 2 part 1, page 278:
+* "This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples."
+*/
+if(brw->wm.prog_data->uses_omask)
+  dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
+
/* CACHE_NEW_WM_PROG */
dw6 |= brw->wm.prog_data->num_varying_inputs <<
   GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
@@ -200,15 +226,40 @@ upload_wm_state(struct brw_context *brw)
  dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
   else
  dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
-  dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
+
+  if (min_inv_per_frag > 1)
+ dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
} else {
   dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
   dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
}
 
+   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
+   /* From the SNB PRM, volume 2 part 1, page 281:
+* "If the PS kernel does not need the Position XY Offsets
+* to compute a Position XY value, then this field should be
+* programmed to POSOFFSET_NONE."
+*
+* "SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation."
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw->wm.prog_data->uses_pos_offset)
+  dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
+   else
+  dw6 |= GEN6_WM_POSOFFSET_NONE;
+
BEGIN_BATCH(9);
OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
-   OUT_BATCH(brw->wm.base.prog_offset);
+   if (brw->wm.prog_data->prog_offset_16 && min_inv_per_frag > 1)
+  OUT_BATCH(brw->wm.base.prog_offset + brw->wm.prog_data->prog_offset_16);
+   else
+  OUT_BATCH(brw->wm.base.prog_offset);
OUT_BATCH(dw2);
if (brw->wm.prog_data->total_scratch) {
   OUT_RELOC(brw->wm.base.scratch_bo,
-- 
1.8.1.4

___

[Mesa-dev] [PATCH V3 04/12] glsl: Add new builtins required by GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

New builtins added by GL_ARB_sample_shading:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]

V2: - Use SWIZZLE_ for STATE_NUM_SAMPLES.
- Use "result.samplemask" in arb_output_attrib_string.
- Add comment to explain the size of gl_SampleMask[] array.
- Make gl_SampleID and gl_SamplePosition system values.
Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
Reviewed-by: Ian Romanick 
---
 src/glsl/builtin_variables.cpp | 18 ++
 src/mesa/main/mtypes.h | 10 +-
 src/mesa/program/prog_print.c  |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
index 7a3505a..4d44104 100644
--- a/src/glsl/builtin_variables.cpp
+++ b/src/glsl/builtin_variables.cpp
@@ -30,6 +30,9 @@
 #include "program/prog_statevars.h"
 #include "program/prog_instruction.h"
 
+static struct gl_builtin_uniform_element gl_NumSamples_elements[] = {
+   {NULL, {STATE_NUM_SAMPLES, 0, 0}, SWIZZLE_}
+};
 
 static struct gl_builtin_uniform_element gl_DepthRange_elements[] = {
{"near", {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_},
@@ -236,6 +239,7 @@ static struct gl_builtin_uniform_element 
gl_NormalMatrix_elements[] = {
 #define STATEVAR(name) {#name, name ## _elements, Elements(name ## _elements)}
 
 static const struct gl_builtin_uniform_desc _mesa_builtin_uniform_desc[] = {
+   STATEVAR(gl_NumSamples),
STATEVAR(gl_DepthRange),
STATEVAR(gl_ClipPlane),
STATEVAR(gl_Point),
@@ -662,6 +666,7 @@ builtin_variable_generator::generate_constants()
 void
 builtin_variable_generator::generate_uniforms()
 {
+   add_uniform(int_t, "gl_NumSamples");
add_uniform(type("gl_DepthRangeParameters"), "gl_DepthRange");
add_uniform(array(vec4_t, VERT_ATTRIB_MAX), "gl_CurrentAttribVertMESA");
add_uniform(array(vec4_t, VARYING_SLOT_MAX), "gl_CurrentAttribFragMESA");
@@ -838,6 +843,19 @@ builtin_variable_generator::generate_fs_special_vars()
   if (state->AMD_shader_stencil_export_warn)
  var->warn_extension = "GL_AMD_shader_stencil_export";
}
+
+   if (state->ARB_sample_shading_enable) {
+  add_system_value(SYSTEM_VALUE_SAMPLE_ID, int_t, "gl_SampleID");
+  add_system_value(SYSTEM_VALUE_SAMPLE_POS, vec2_t, "gl_SamplePosition");
+  /* From the ARB_sample_shading specification:
+   *"The number of elements in the array is ceil(/32), where
+   * is the maximum number of color samples supported by the
+   *implementation."
+   * Since no drivers expose more than 32x MSAA, we can simply set
+   * the array size to 1 rather than computing it.
+   */
+  add_output(FRAG_RESULT_SAMPLE_MASK, array(int_t, 1), "gl_SampleMask");
+   }
 }
 
 
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 4774c8e..b5c5583 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -274,6 +274,11 @@ typedef enum
 #define VARYING_BIT_VAR(V) BITFIELD64_BIT(VARYING_SLOT_VAR0 + (V))
 /*@}*/
 
+/**
+ * Bitflags for system values.
+ */
+#define SYSTEM_BIT_SAMPLE_ID BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_ID)
+#define SYSTEM_BIT_SAMPLE_POS BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_POS)
 
 /**
  * Determine if the given gl_varying_slot appears in the fragment shader.
@@ -306,12 +311,13 @@ typedef enum
 * register is written.  No FRAG_RESULT_DATAn will be written.
 */
FRAG_RESULT_COLOR = 2,
+   FRAG_RESULT_SAMPLE_MASK = 3,
 
/* FRAG_RESULT_DATAn are the per-render-target (GLSL gl_FragData[n]
 * or ARB_fragment_program fragment.color[n]) color results.  If
 * any are written, FRAG_RESULT_COLOR will not be written.
 */
-   FRAG_RESULT_DATA0 = 3,
+   FRAG_RESULT_DATA0 = 4,
FRAG_RESULT_MAX = (FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
 } gl_frag_result;
 
@@ -1904,6 +1910,8 @@ typedef enum
SYSTEM_VALUE_FRONT_FACE,  /**< Fragment shader only (not done yet) */
SYSTEM_VALUE_VERTEX_ID,   /**< Vertex shader only */
SYSTEM_VALUE_INSTANCE_ID, /**< Vertex shader only */
+   SYSTEM_VALUE_SAMPLE_ID,   /**< Fragment shader only */
+   SYSTEM_VALUE_SAMPLE_POS,  /**< Fragment shader only */
SYSTEM_VALUE_MAX  /**< Number of values */
 } gl_system_value;
 
diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c
index cf85213..fa9063f 100644
--- a/src/mesa/program/prog_print.c
+++ b/src/mesa/program/prog_print.c
@@ -311,6 +311,7 @@ arb_output_attrib_string(GLint index, GLenum progType)
   "result.depth", /* FRAG_RESULT_DEPTH */
   "result.(one)", /* FRAG_RESULT_STENCIL */
   "result.color", /* FRAG_RESULT_COLOR */
+  "result.samplemask", /* FRAG_RESULT_SAMPLE_MASK */
   "result.color[0]", /* FRAG_RESULT_DATA0 (named for GLSL's gl_FragData) */
   "result.color[1]",
   "result.color[2]",
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH V3 02/12] mesa: Add new functions and enums required by GL_ARB_sample_shading

2013-10-31 Thread Anuj Phogat

New functions added by GL_ARB_sample_shading:
glMinSampleShadingARB()

New enums:
GL_SAMPLE_SHADING_ARB
GL_MIN_SAMPLE_SHADING_VALUE_ARB

V2: Update comments.
Create new GL4x.xml.
Remove redundant code in get.c.
Update the API_XML list in Makefile.am.
Add extra_gl40_ARB_sample_shading predicate to get.c.

V3:
   Fix make check failure.
   Add checks for desktop GL.
   Use GLfloat in place of GLclampf in glMinSampleShading().
Signed-off-by: Anuj Phogat 
Reviewed-by: Ken Graunke 
---
 src/mapi/glapi/gen/ARB_sample_shading.xml | 19 +++
 src/mapi/glapi/gen/GL4x.xml   | 21 +
 src/mapi/glapi/gen/Makefile.am|  5 -
 src/mapi/glapi/gen/gl_API.xml |  4 +++-
 src/mesa/main/enable.c| 18 ++
 src/mesa/main/get.c   |  8 
 src/mesa/main/get_hash_params.py  |  3 +++
 src/mesa/main/mtypes.h|  2 ++
 src/mesa/main/multisample.c   | 18 ++
 src/mesa/main/multisample.h   |  2 ++
 src/mesa/main/tests/dispatch_sanity.cpp   |  4 ++--
 11 files changed, 100 insertions(+), 4 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml
 create mode 100644 src/mapi/glapi/gen/GL4x.xml

diff --git a/src/mapi/glapi/gen/ARB_sample_shading.xml 
b/src/mapi/glapi/gen/ARB_sample_shading.xml
new file mode 100644
index 000..cc8296a
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_sample_shading.xml
@@ -0,0 +1,19 @@
+
+
+
+
+
+
+
+
+
+   
+   
+
+   
+  
+   
+
+
+
+
diff --git a/src/mapi/glapi/gen/GL4x.xml b/src/mapi/glapi/gen/GL4x.xml
new file mode 100644
index 000..6706278
--- /dev/null
+++ b/src/mapi/glapi/gen/GL4x.xml
@@ -0,0 +1,21 @@
+
+
+
+
+
+
+
+
+  
+  
+
+  
+
+  
+
+
+
+
+
+
+
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index 3902452..cbbf659 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -108,6 +108,7 @@ API_XML = \
ARB_invalidate_subdata.xml \
ARB_map_buffer_range.xml \
ARB_robustness.xml \
+   ARB_sample_shading.xml \
ARB_sampler_objects.xml \
ARB_seamless_cube_map.xml \
ARB_shader_atomic_counters.xml \
@@ -144,7 +145,9 @@ API_XML = \
NV_texture_barrier.xml \
NV_vdpau_interop.xml \
OES_EGL_image.xml \
-   GL3x.xml
+   GL3x.xml \
+   GL4x.xml
+
 
 
 COMMON = $(API_XML) \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 45d69b4..69014c5 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8187,7 +8187,7 @@
 http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 
-
+http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 http://www.w3.org/2001/XInclude"/>
 
@@ -13122,4 +13122,6 @@
 
 http://www.w3.org/2001/XInclude"/>
 
+http://www.w3.org/2001/XInclude"/>
+
 
diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
index 5e2fd80..dd6a772 100644
--- a/src/mesa/main/enable.c
+++ b/src/mesa/main/enable.c
@@ -802,6 +802,17 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
GLboolean state)
  ctx->Multisample.SampleCoverageInvert = state;
  break;
 
+  /* GL_ARB_sample_shading */
+  case GL_SAMPLE_SHADING:
+ if (!_mesa_is_desktop_gl(ctx))
+goto invalid_enum_error;
+ CHECK_EXTENSION(ARB_sample_shading, cap);
+ if (ctx->Multisample.SampleShading == state)
+return;
+ FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
+ ctx->Multisample.SampleShading = state;
+ break;
+
   /* GL_IBM_rasterpos_clip */
   case GL_RASTER_POSITION_UNCLIPPED_IBM:
  if (ctx->API != API_OPENGL_COMPAT)
@@ -1594,6 +1605,13 @@ _mesa_IsEnabled( GLenum cap )
  CHECK_EXTENSION(ARB_texture_multisample);
  return ctx->Multisample.SampleMask;
 
+  /* ARB_sample_shading */
+  case GL_SAMPLE_SHADING:
+ if (!_mesa_is_desktop_gl(ctx))
+goto invalid_enum_error;
+ CHECK_EXTENSION(ARB_sample_shading);
+ return ctx->Multisample.SampleShading;
+
   default:
  goto invalid_enum_error;
}
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 6e72ff5..6a0de0c 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -131,6 +131,7 @@ enum value_extra {
EXTRA_VERSION_30,
EXTRA_VERSION_31,
EXTRA_VERSION_32,
+   EXTRA_VERSION_40,
EXTRA_API_GL,
EXTRA_API_GL_CORE,
EXTRA_API_ES2,
@@ -391,6 +392,7 @@ extra_NV_primitive_restart[] = {
 static const int extra_version_30[] = { EXTRA_VERSION_30, EXTRA_END };
 static const int extra_version_31[] = { EXTRA_VERSION_31, EXTRA_END };
 static const int extra_version_32[] = { EXTRA_VERSION_32, EXTRA_END };
+static const int extra_version_40[] = { EXTRA_VERSION_40, EXTRA_END };
 
 static const int extra_gl30_es3[] = {
 EXTRA_VERSION_30,
@@ -410,6 +

[Mesa-dev] [PATCH V3 09/12] i965: Add FS backend for builtin gl_SampleMask[]

2013-10-31 Thread Anuj Phogat

V2:
   - Update comments
   - Add a special backend instructions to compute sample_mask.
   - Add a new variable uses_omask in brw_wm_prog_data.

V3:
   - Make changes to support simd16 mode.
   - Delete redundant AND instruction and handle the register
 stride in FS backend instruction.
Signed-off-by: Anuj Phogat 
Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/brw_context.h|  1 +
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.h |  5 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 42 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 12 
 5 files changed, 61 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 665807c..bec4d6b 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -384,6 +384,7 @@ struct brw_wm_prog_data {
GLuint nr_pull_params;
bool dual_src_blend;
bool uses_pos_offset;
+   bool uses_omask;
uint32_t prog_offset_16;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index e03ac58..bfea88a 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -794,6 +794,7 @@ enum opcode {
FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
FS_OPCODE_DISCARD_JUMP,
+   FS_OPCODE_SET_OMASK,
FS_OPCODE_SET_SAMPLE_ID,
FS_OPCODE_SET_SIMD4X2_OFFSET,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 0ed8c05..e9bf52f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -443,6 +443,7 @@ public:
 
struct hash_table *variable_ht;
fs_reg frag_depth;
+   fs_reg sample_mask;
fs_reg outputs[BRW_MAX_DRAW_BUFFERS];
unsigned output_components[BRW_MAX_DRAW_BUFFERS];
fs_reg dual_src_output;
@@ -549,6 +550,10 @@ private:
  struct brw_reg offset);
void generate_mov_dispatch_to_flags(fs_inst *inst);
 
+   void generate_set_omask(fs_inst *inst,
+   struct brw_reg dst,
+   struct brw_reg sample_mask);
+
void generate_set_sample_id(fs_inst *inst,
struct brw_reg dst,
struct brw_reg src0,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2553098..d67c4e4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1068,6 +1068,44 @@ fs_generator::generate_set_simd4x2_offset(fs_inst *inst,
brw_pop_insn_state(p);
 }
 
+/* Sets vstride=16, width=8, hstride=2 or vstride=0, width=1, hstride=0
+ * (when mask is passed as a uniform) of register mask before moving it
+ * to register dst.
+ */
+void
+fs_generator::generate_set_omask(fs_inst *inst,
+ struct brw_reg dst,
+ struct brw_reg mask)
+{
+   bool stride_8_8_1 =
+(mask.vstride == BRW_VERTICAL_STRIDE_8 &&
+ mask.width == BRW_WIDTH_8 &&
+ mask.hstride == BRW_HORIZONTAL_STRIDE_1);
+
+   bool stride_0_1_0 =
+(mask.vstride == BRW_VERTICAL_STRIDE_0 &&
+ mask.width == BRW_WIDTH_1 &&
+ mask.hstride == BRW_HORIZONTAL_STRIDE_0);
+
+   assert(stride_8_8_1 || stride_0_1_0);
+   assert(dst.type == BRW_REGISTER_TYPE_UW);
+
+   if (dispatch_width == 16)
+  dst = vec16(dst);
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+
+   if (stride_8_8_1) {
+  brw_MOV(p, dst, stride(retype(brw_vec1_reg(mask.file, mask.nr, 0),
+dst.type), 16, 8, 2));
+   } else if (stride_0_1_0) {
+  brw_MOV(p, dst, stride(retype(brw_vec1_reg(mask.file, mask.nr, 0),
+dst.type), 0, 1, 0));
+   }
+   brw_pop_insn_state(p);
+}
+
 /* Sets vstride=1, width=4, hstride=0 of register src1 during
  * the ADD instruction.
  */
@@ -1666,6 +1704,10 @@ fs_generator::generate_code(exec_list *instructions)
  generate_set_simd4x2_offset(inst, dst, src[0]);
  break;
 
+  case FS_OPCODE_SET_OMASK:
+ generate_set_omask(inst, dst, src[0]);
+ break;
+
   case FS_OPCODE_SET_SAMPLE_ID:
  generate_set_sample_id(inst, dst, src[0], src[1]);
  break;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 0993637..dd60671 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -82,6 +82,8 @@ fs_visitor::visit(ir_variable *ir)
 }
   } else if (ir->location == FRAG_RESULT_DEPTH) {
 this->frag_depth = *reg;
+  } else

[Mesa-dev] [PATCH V3 03/12] mesa: Pass number of samples as a program state variable

2013-10-31 Thread Anuj Phogat

Number of samples will be required in fragment shader program by new
GLSL builtin uniform "gl_NumSamples".

V2: Use "state.numsamples" in place of "state.num.samples"
Use _NEW_BUFFERS flag in place of _NEW_MULTISAMPLE

Signed-off-by: Anuj Phogat 
Reviewed-by: Ian Romanick 
Reviewed-by: Ken Graunke 
Reviewed-by: Paul Berry 
---
 src/mesa/program/prog_statevars.c | 11 +++
 src/mesa/program/prog_statevars.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/src/mesa/program/prog_statevars.c 
b/src/mesa/program/prog_statevars.c
index 145c07c..f6fd535 100644
--- a/src/mesa/program/prog_statevars.c
+++ b/src/mesa/program/prog_statevars.c
@@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const 
gl_state_index state[],
  }
   }
   return;
+   case STATE_NUM_SAMPLES:
+  ((int *)value)[0] = ctx->DrawBuffer->Visual.samples;
+  return;
case STATE_DEPTH_RANGE:
   value[0] = ctx->Viewport.Near; /* near   */
   value[1] = ctx->Viewport.Far;  /* far*/
@@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index 
state[STATE_LENGTH])
case STATE_PROGRAM_MATRIX:
   return _NEW_TRACK_MATRIX;
 
+   case STATE_NUM_SAMPLES:
+  return _NEW_BUFFERS;
+
case STATE_DEPTH_RANGE:
   return _NEW_VIEWPORT;
 
@@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k)
case STATE_TEXENV_COLOR:
   append(dst, "texenv");
   break;
+   case STATE_NUM_SAMPLES:
+  append(dst, "numsamples");
+  break;
case STATE_DEPTH_RANGE:
   append(dst, "depth.range");
   break;
@@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index 
state[STATE_LENGTH])
   break;
case STATE_FOG_COLOR:
   break;
+   case STATE_NUM_SAMPLES:
+  break;
case STATE_DEPTH_RANGE:
   break;
case STATE_FRAGMENT_PROGRAM:
diff --git a/src/mesa/program/prog_statevars.h 
b/src/mesa/program/prog_statevars.h
index ec22b73..d648ed2 100644
--- a/src/mesa/program/prog_statevars.h
+++ b/src/mesa/program/prog_statevars.h
@@ -103,6 +103,8 @@ typedef enum gl_state_index_ {
 
STATE_TEXENV_COLOR,
 
+   STATE_NUM_SAMPLES,/* An interger, not a float like the other state vars 
*/
+
STATE_DEPTH_RANGE,
 
STATE_VERTEX_PROGRAM,
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 71110] New: xorg_driver.c:1030:2: error: too many arguments to function ‘DamageUnregister’

2013-10-31 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=71110

  Priority: medium
Bug ID: 71110
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: xorg_driver.c:1030:2: error: too many arguments to
function ‘DamageUnregister’
  Severity: blocker
Classification: Unclassified
OS: Linux (All)
  Reporter: v...@freedesktop.org
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Other
   Product: Mesa

mesa: 723f047a3bc21b536d27ab1c0d6cc82f93002648 (master)
xorg-x11-server-devel: xorg-x11-server-devel-1.14.99.3-1.20131021.fc21.x86_64

$ scons
[...]
  Compiling src/gallium/state_trackers/xorg/xorg_driver.c ...
src/gallium/state_trackers/xorg/xorg_driver.c: In function ‘drv_pre_init’:
src/gallium/state_trackers/xorg/xorg_driver.c:352:23: warning: variable
‘xf86_config’ set but not used [-Wunused-but-set-variable]
 xf86CrtcConfigPtr xf86_config;
   ^
src/gallium/state_trackers/xorg/xorg_driver.c: In function ‘drv_close_screen’:
src/gallium/state_trackers/xorg/xorg_driver.c:1030:2: warning: passing argument
1 of ‘DamageUnregister’ from incompatible pointer type [enabled by default]
  DamageUnregister(&pScreen->GetScreenPixmap(pScreen)->drawable, ms->damage);
  ^
In file included from /usr/include/xorg/pixmapstr.h:53:0,
 from /usr/include/xorg/xf86str.h:40,
 from /usr/include/xorg/xf86.h:44,
 from src/gallium/state_trackers/xorg/xorg_driver.c:33:
/usr/include/xorg/damage.h:77:2: note: expected ‘DamagePtr’ but argument is of
type ‘struct DrawableRec *’
  DamageUnregister(DamagePtr pDamage);
  ^
src/gallium/state_trackers/xorg/xorg_driver.c:1030:2: error: too many arguments
to function ‘DamageUnregister’
  DamageUnregister(&pScreen->GetScreenPixmap(pScreen)->drawable, ms->damage);
  ^
In file included from /usr/include/xorg/pixmapstr.h:53:0,
 from /usr/include/xorg/xf86str.h:40,
 from /usr/include/xorg/xf86.h:44,
 from src/gallium/state_trackers/xorg/xorg_driver.c:33:
/usr/include/xorg/damage.h:77:2: note: declared here
  DamageUnregister(DamagePtr pDamage);
  ^

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 71110] xorg_driver.c:1030:2: error: too many arguments to function ‘DamageUnregister’

2013-10-31 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=71110

Vinson Lee  changed:

   What|Removed |Added

 CC||a...@nwnk.net,
   ||e...@anholt.net

--- Comment #1 from Vinson Lee  ---
commit d08966227e7d567df8d26eebc80f35f886e59a4a
Author: Adam Jackson 
Date:   Mon Aug 26 14:03:51 2013 -0400

damage: Simplify DamageUnregister

You can only register one drawable on a given damage, so there's no
reason to require the caller to specify the drawable, the damage is
enough.  The implementation would do something fairly horrible if you
_did_ pass mismatched drawable and damage, so let's avoid the problem
entirely.

v2: Simplify xf86RotateDestroy even more [anholt]

Reviewed-by: Chris Wilson 
Reviewed-by: Eric Anholt 
Signed-off-by: Adam Jackson 

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] dri: Add driconf option clamp_max_samples

2013-10-31 Thread Chad Versace

This clamps GL_MAX_SAMPLES to the given value. If negative, then no
clamping occurs.

This patch adds the option, but no driver respects it yet.

CC: Eric Anholt 
Signed-off-by: Chad Versace 
---

This little series lives on my driconf-clamp-max-samples branch.


 src/mesa/drivers/dri/common/xmlpool/t_options.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/common/xmlpool/t_options.h 
b/src/mesa/drivers/dri/common/xmlpool/t_options.h
index 3bf804a..2a02406 100644
--- a/src/mesa/drivers/dri/common/xmlpool/t_options.h
+++ b/src/mesa/drivers/dri/common/xmlpool/t_options.h
@@ -215,7 +215,10 @@ DRI_CONF_OPT_BEGIN_V(pp_jimenezmlaa_color,int,def, # min 
":" # max ) \
 DRI_CONF_DESC(en,gettext("Morphological anti-aliasing based on 
Jimenez\\\' MLAA. 0 to disable, 8 for default quality. Color version, usable 
with 2d GL apps")) \
 DRI_CONF_OPT_END
 
-
+#define DRI_CONF_CLAMP_MAX_SAMPLES() \
+DRI_CONF_OPT_BEGIN(clamp_max_samples, int, -1) \
+DRI_CONF_DESC(en,gettext("Clamp the value of GL_MAX_SAMPLES to the 
given integer. If negative, then do not clamp.")) \
+DRI_CONF_OPT_END
 
 /**
  * \brief Performance-related options
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965: Respect driconf option clamp_max_samples

2013-10-31 Thread Chad Versace

Clamp gen7 GL_MAX_SAMPLES to 0, 4, or 8.
Clamp gen6 GL_MAX_SAMPLES to 0 or 4.
Clamp other gens to 0.

CC: Eric Anholt 
Signed-off-by: Chad Versace 
---
 src/mesa/drivers/dri/i965/brw_context.c  | 35 +++-
 src/mesa/drivers/dri/i965/intel_screen.c |  1 +
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 38147e9..60a201f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -273,6 +273,10 @@ brw_initialize_context_constants(struct brw_context *brw)
 {
struct gl_context *ctx = &brw->ctx;
 
+   uint32_t max_samples;
+   const uint32_t *legal_max_samples;
+   uint32_t clamp_max_samples;
+
ctx->Const.QueryCounterBits.Timestamp = 36;
 
ctx->Const.StripTextureBorder = true;
@@ -333,19 +337,30 @@ brw_initialize_context_constants(struct brw_context *brw)
 
ctx->Const.AlwaysUseGetTransformFeedbackVertexCount = true;
 
-   if (brw->gen == 6) {
-  ctx->Const.MaxSamples = 4;
-  ctx->Const.MaxColorTextureSamples = 4;
-  ctx->Const.MaxDepthTextureSamples = 4;
-  ctx->Const.MaxIntegerSamples = 4;
-   } else if (brw->gen >= 7) {
-  ctx->Const.MaxSamples = 8;
-  ctx->Const.MaxColorTextureSamples = 8;
-  ctx->Const.MaxDepthTextureSamples = 8;
-  ctx->Const.MaxIntegerSamples = 8;
+   if (brw->gen >= 7) {
+  legal_max_samples = (uint32_t[]){8, 4, 0};
   ctx->Const.MaxProgramTextureGatherComponents = 4;
+   } else if (brw->gen == 6) {
+  legal_max_samples = (uint32_t[]){4, 0};
+   } else {
+  legal_max_samples = (uint32_t[]){0};
+   }
+
+   /* Set max_samples = min(max(legal_max_samples), clamp_max_samples). */
+   max_samples = 0;
+   clamp_max_samples = driQueryOptioni(&brw->optionCache, "clamp_max_samples");
+   for (uint32_t i = 0; legal_max_samples[i] != 0; ++i) {
+  if (legal_max_samples[i] <= clamp_max_samples) {
+ max_samples = legal_max_samples[i];
+ break;
+  }
}
 
+   ctx->Const.MaxSamples = max_samples;
+   ctx->Const.MaxColorTextureSamples = max_samples;
+   ctx->Const.MaxDepthTextureSamples = max_samples;
+   ctx->Const.MaxIntegerSamples = max_samples;
+
ctx->Const.MinLineWidth = 1.0;
ctx->Const.MinLineWidthAA = 1.0;
ctx->Const.MaxLineWidth = 5.0;
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index eafafa2..0e991ec 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -67,6 +67,7 @@ DRI_CONF_BEGIN
DRI_CONF_SECTION_END
DRI_CONF_SECTION_QUALITY
   DRI_CONF_FORCE_S3TC_ENABLE("false")
+  DRI_CONF_CLAMP_MAX_SAMPLES()
DRI_CONF_SECTION_END
DRI_CONF_SECTION_DEBUG
   DRI_CONF_NO_RAST("false")
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/5] i965: Add a pass to remove dead control flow.

2013-10-31 Thread Matt Turner

Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.

total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)

(no change in vertex shaders)

Reviewed-by: Paul Berry  [v1]
v2: Moved before SEL peephole in series. Made function useful for VS.
---
 src/mesa/drivers/dri/i965/Makefile.sources |  1 +
 .../drivers/dri/i965/brw_dead_control_flow.cpp | 80 ++
 src/mesa/drivers/dri/i965/brw_dead_control_flow.h  | 26 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  2 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  2 +
 5 files changed, 111 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp
 create mode 100644 src/mesa/drivers/dri/i965/brw_dead_control_flow.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 14b2f61..06ac843 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -44,6 +44,7 @@ i965_FILES = \
brw_context.c \
brw_cubemap_normalize.cpp \
brw_curbe.c \
+   brw_dead_control_flow.cpp \
brw_device_info.c \
brw_disasm.c \
brw_draw.c \
diff --git a/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp 
b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp
new file mode 100644
index 000..d509a93
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_dead_control_flow.cpp
@@ -0,0 +1,80 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+/** @file brw_dead_control_flow.cpp
+ *
+ * This file implements the dead control flow elimination optimization pass.
+ */
+
+#include "brw_shader.h"
+#include "brw_cfg.h"
+
+/* Look for and eliminate dead control flow:
+ *
+ *   - if/endif
+ *   - if/else/endif
+ */
+bool
+dead_control_flow_eliminate(backend_visitor *v)
+{
+   bool progress = false;
+
+   cfg_t cfg(v);
+
+   for (int b = 0; b < cfg.num_blocks; b++) {
+  bblock_t *block = cfg.blocks[b];
+  bool found = false;
+
+  /* ENDIF instructions, by definition, can only be found at the ends of
+   * basic blocks.
+   */
+  backend_instruction *endif_inst = block->end;
+  if (endif_inst->opcode != BRW_OPCODE_ENDIF)
+ continue;
+
+  backend_instruction *if_inst = NULL, *else_inst = NULL;
+  backend_instruction *prev_inst = (backend_instruction *) 
endif_inst->prev;
+  if (prev_inst->opcode == BRW_OPCODE_IF) {
+ if_inst = prev_inst;
+ found = true;
+  } else if (prev_inst->opcode == BRW_OPCODE_ELSE) {
+ else_inst = prev_inst;
+
+ prev_inst = (backend_instruction *) prev_inst->prev;
+ if (prev_inst->opcode == BRW_OPCODE_IF) {
+if_inst = prev_inst;
+found = true;
+ }
+  }
+
+  if (found) {
+ if_inst->remove();
+ if (else_inst)
+else_inst->remove();
+ endif_inst->remove();
+ progress = true;
+  }
+   }
+
+   return progress;
+}
diff --git a/src/mesa/drivers/dri/i965/brw_dead_control_flow.h 
b/src/mesa/drivers/dri/i965/brw_dead_control_flow.h
new file mode 100644
index 000..57a4dab
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_dead_control_flow.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permi

[Mesa-dev] [PATCH 4/5] i965/fs: Extend SEL peephole to handle only matching MOVs.

2013-10-31 Thread Matt Turner

Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:

   (+f0) IF
   MOV dst0 ...
   MOV dst1 ...
   MOV dst2 ...
   ELSE
   MOV dst0 ...
   MOV dst1 ...
   MOV dst3 ...
   ENDIF

This commit extends the peephole to handle this case.

No shader-db changes.
---
This patch made a much bigger difference when pulling instructions from the
ends of the "then" and "else" blocks, because often we were able to compute
to MRF.

 src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
index 9626751..c875487 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
@@ -142,7 +142,6 @@ fs_visitor::opt_peephole_sel()
 
   fs_inst *else_mov[MAX_MOVS] = { NULL };
   fs_inst *then_mov[MAX_MOVS] = { NULL };
-  bool malformed_mov_found = false;
 
   int movs = count_movs_from_if(then_mov, else_mov, if_inst, else_inst);
 
@@ -161,7 +160,7 @@ fs_visitor::opt_peephole_sel()
  if (!then_mov[i]->dst.equals(else_mov[i]->dst) ||
  then_mov[i]->is_partial_write() ||
  else_mov[i]->is_partial_write()) {
-malformed_mov_found = true;
+movs = i;
 break;
  }
 
@@ -188,7 +187,7 @@ fs_visitor::opt_peephole_sel()
  }
   }
 
-  if (malformed_mov_found)
+  if (movs == 0)
  continue;
 
   /* Emit a CMP if our IF used the embedded comparison */
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/5] i965: Keep pointers to IF/ELSE/ENDIF instructions in the cfg.

2013-10-31 Thread Matt Turner

Useful for finding the associated control flow instructions, given a
block ending in one.
---
 src/mesa/drivers/dri/i965/brw_cfg.cpp | 34 ++
 src/mesa/drivers/dri/i965/brw_cfg.h   | 10 ++
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp 
b/src/mesa/drivers/dri/i965/brw_cfg.cpp
index e9d2bb8..7e0e9a9 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.cpp
+++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp
@@ -28,7 +28,7 @@
 #include "brw_fs.h"
 #include "brw_cfg.h"
 
-/** @file brw_cfg_t.cpp
+/** @file brw_cfg.cpp
  *
  * Walks the shader instructions generated and creates a set of basic
  * blocks with successor/predecessor edges connecting them.
@@ -52,6 +52,10 @@ bblock_t::bblock_t() :
 
parents.make_empty();
children.make_empty();
+
+   if_inst = NULL;
+   else_inst = NULL;
+   endif_inst = NULL;
 }
 
 void
@@ -90,6 +94,7 @@ cfg_t::create(void *parent_mem_ctx, exec_list *instructions)
bblock_t *entry = new_block();
bblock_t *cur_if = NULL, *cur_else = NULL, *cur_endif = NULL;
bblock_t *cur_do = NULL, *cur_while = NULL;
+   backend_instruction *if_inst = NULL, *else_inst = NULL, *endif_inst = NULL;
exec_list if_stack, else_stack, endif_stack, do_stack, while_stack;
bblock_t *next;
 
@@ -121,6 +126,10 @@ cfg_t::create(void *parent_mem_ctx, exec_list 
*instructions)
  */
 cur_endif = new_block();
 
+ if_inst = cur->end;
+ else_inst = NULL;
+ endif_inst = NULL;
+
 /* Set up our immediately following block, full of "then"
  * instructions.
  */
@@ -134,6 +143,8 @@ cfg_t::create(void *parent_mem_ctx, exec_list *instructions)
   case BRW_OPCODE_ELSE:
 cur->add_successor(mem_ctx, cur_endif);
 
+ else_inst = cur->end;
+
 next = new_block();
 next->start = (backend_instruction *)inst->next;
 cur_if->add_successor(mem_ctx, next);
@@ -142,20 +153,35 @@ cfg_t::create(void *parent_mem_ctx, exec_list 
*instructions)
 set_next_block(next);
 break;
 
-  case BRW_OPCODE_ENDIF:
+  case BRW_OPCODE_ENDIF: {
 cur_endif->start = (backend_instruction *)inst->next;
 cur->add_successor(mem_ctx, cur_endif);
+
+ endif_inst = cur->end;
 set_next_block(cur_endif);
 
-if (!cur_else)
+ cur_if->if_inst = if_inst;
+ cur_if->else_inst = else_inst;
+ cur_if->endif_inst = endif_inst;
+
+if (!cur_else) {
cur_if->add_successor(mem_ctx, cur_endif);
+ } else {
+cur_else->if_inst = if_inst ;
+cur_else->else_inst = else_inst;
+cur_else->endif_inst = endif_inst;
+ }
+
+ cur->if_inst = if_inst;
+ cur->else_inst = else_inst;
+ cur->endif_inst = endif_inst;
 
 /* Pop the stack so we're in the previous if/else/endif */
 cur_if = pop_stack(&if_stack);
 cur_else = pop_stack(&else_stack);
 cur_endif = pop_stack(&endif_stack);
 break;
-
+  }
   case BRW_OPCODE_DO:
 /* Push our information onto a stack so we can recover from
  * nested loops.
diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h 
b/src/mesa/drivers/dri/i965/brw_cfg.h
index ec5a3a0..2ea492e 100644
--- a/src/mesa/drivers/dri/i965/brw_cfg.h
+++ b/src/mesa/drivers/dri/i965/brw_cfg.h
@@ -56,6 +56,16 @@ public:
exec_list parents;
exec_list children;
int block_num;
+
+   /* If the current basic block ends in an IF, ELSE, or ENDIF instruction,
+* these pointers will hold the locations of the basic blocks ending in the
+* other associated control flow instructions.
+*
+* Otherwise they are NULL.
+*/
+   backend_instruction *if_inst;
+   backend_instruction *else_inst;
+   backend_instruction *endif_inst;
 };
 
 class cfg_t {
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/5] i965/fs: New peephole optimization to generate SEL.

2013-10-31 Thread Matt Turner

fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).

This peephole fixes both of these and operates as a normal optimization
pass.

fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.

total instructions in shared programs: 1417596 -> 1405551 (-0.85%)
instructions in affected programs: 168590 -> 156545 (-7.14%)
GAINED:10
LOST:  4
---
 src/mesa/drivers/dri/i965/Makefile.sources|   1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp  |   1 +
 src/mesa/drivers/dri/i965/brw_fs.h|   1 +
 src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp | 217 ++
 4 files changed, 220 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 06ac843..79ed3dd 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -60,6 +60,7 @@ i965_FILES = \
brw_fs_fp.cpp \
brw_fs_generator.cpp \
brw_fs_live_variables.cpp \
+   brw_fs_sel_peephole.cpp \
brw_fs_reg_allocate.cpp \
brw_fs_vector_splitting.cpp \
brw_fs_visitor.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index df17684..345dc19 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -3107,6 +3107,7 @@ fs_visitor::run()
 progress = opt_algebraic() || progress;
 progress = opt_cse() || progress;
 progress = opt_copy_propagate() || progress;
+ progress = opt_peephole_sel() || progress;
 progress = dead_code_eliminate() || progress;
 progress = dead_code_eliminate_local() || progress;
  progress = dead_control_flow_eliminate(this) || progress;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 849e49f..a592dda 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -363,6 +363,7 @@ public:
bool try_emit_saturate(ir_expression *ir);
bool try_emit_mad(ir_expression *ir, int mul_arg);
void try_replace_with_sel();
+   bool opt_peephole_sel();
void emit_bool_to_cond_code(ir_rvalue *condition);
void emit_if_gen6(ir_if *ir);
void emit_unspill(fs_inst *inst, fs_reg reg, uint32_t spill_offset,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
new file mode 100644
index 000..9626751
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
@@ -0,0 +1,217 @@
+/*
+ * Copyright © 2013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_fs.h"
+#include "brw_cfg.h"
+
+/** @file brw_fs_sel_peephole.cpp
+ *
+ * This file contains the opt_peephole_sel() optimization pass that replaces
+ * MOV instructions to the same destination in the "then" and "else" bodies of
+ * an if statement with SEL instructions.
+ */
+
+#define MAX_MOVS 8 /**< The maximum number of MOVs to attempt to match. */
+
+/**
+ * Scans forwards from an IF counting consecutive MOV instructions in the
+ * "then" and "else" blocks of the if statement.
+ *
+ * A pointer to the fs_inst* for IF is passed as the  argument. The
+ * function stores pointers to the MOV instructions in the  and
+ *  arrays.
+ *
+ * \return the minimum number of MOVs found in the two branches or zero if
+ * an error occurred.
+ *
+ * E.g.:
+ *  IF ...
+ *the

[Mesa-dev] [PATCH 5/5] i965/fs: Emit a MOV instead of a SEL if the sources are the same.

2013-10-31 Thread Matt Turner

One program affected.

instructions in affected programs: 436 -> 428 (-1.83%)

Reviewed-by: Paul Berry 
---
 src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp | 42 +--
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
index c875487..15027e0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_sel_peephole.cpp
@@ -164,26 +164,30 @@ fs_visitor::opt_peephole_sel()
 break;
  }
 
- /* Only the last source register can be a constant, so if the MOV in
-  * the "then" clause uses a constant, we need to put it in a
-  * temporary.
-  */
- fs_reg src0(then_mov[i]->src[0]);
- if (src0.file == IMM) {
-src0 = fs_reg(this, glsl_type::float_type);
-src0.type = then_mov[i]->src[0].type;
-mov_imm_inst[i] = MOV(src0, then_mov[i]->src[0]);
- }
-
- sel_inst[i] = SEL(then_mov[i]->dst, src0, else_mov[i]->src[0]);
-
- if (brw->gen == 6 && if_inst->conditional_mod) {
-/* For Sandybridge with IF with embedded comparison */
-sel_inst[i]->predicate = BRW_PREDICATE_NORMAL;
+ if (!then_mov[i]->src[0].equals(else_mov[i]->src[0])) {
+/* Only the last source register can be a constant, so if the MOV
+ * in the "then" clause uses a constant, we need to put it in a
+ * temporary.
+ */
+fs_reg src0(then_mov[i]->src[0]);
+if (src0.file == IMM) {
+   src0 = fs_reg(this, glsl_type::float_type);
+   src0.type = then_mov[i]->src[0].type;
+   mov_imm_inst[i] = MOV(src0, then_mov[i]->src[0]);
+}
+
+sel_inst[i] = SEL(then_mov[i]->dst, src0, else_mov[i]->src[0]);
+
+if (brw->gen == 6 && if_inst->conditional_mod) {
+   /* For Sandybridge with IF with embedded comparison */
+   sel_inst[i]->predicate = BRW_PREDICATE_NORMAL;
+} else {
+   /* Separate CMP and IF instructions */
+   sel_inst[i]->predicate = if_inst->predicate;
+   sel_inst[i]->predicate_inverse = if_inst->predicate_inverse;
+}
  } else {
-/* Separate CMP and IF instructions */
-sel_inst[i]->predicate = if_inst->predicate;
-sel_inst[i]->predicate_inverse = if_inst->predicate_inverse;
+sel_inst[i] = MOV(then_mov[i]->dst, then_mov[i]->src[0]);
  }
   }
 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

56 matches

Mail list logo