On 11/15/2011 07:11 AM, Vadim Girlin wrote:
On Tue, 2011-11-15 at 06:48 -0800, Jose Fonseca wrote:

----- Original Message -----
On Tue, 2011-11-15 at 05:52 -0800, Jose Fonseca wrote:

----- Original Message -----
On 11/14/2011 07:16 AM, Marek Olšák wrote:
On Mon, Nov 14, 2011 at 2:49 PM, Vadim
Girlin<vadimgir...@gmail.com>   wrote:
On Mon, 2011-11-14 at 05:22 -0800, Jose Fonseca wrote:

----- Original Message -----
Hi,

I found some problem with glsl_to_tgsi: remove_output_reads
function.
It's replacing outputs with temps, producing incorrect
results
with
relative addressing. You can see it e.g. with
"vs-varying-array-mat2-col-rd.shader_test". Here is a dump:

VERT
DCL IN[0]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[12]
DCL OUT[2], GENERIC[13]
DCL OUT[3], GENERIC[14]
DCL OUT[4], GENERIC[15]
DCL OUT[5], GENERIC[16]
DCL OUT[6], GENERIC[17]
DCL OUT[7], GENERIC[18]
DCL CONST[0..5]
DCL TEMP[0..1]
DCL ADDR[0]
IMM FLT32 {    1.0000,     2.0000,     3.0000,     4.0000}
IMM FLT32 {    5.0000,     6.0000,     7.0000,     8.0000}
IMM FLT32 {    9.0000,    10.0000,    11.0000,    12.0000}
IMM FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
    0: MUL TEMP[0], CONST[2], IN[0].xxxx
    1: MAD TEMP[1], CONST[3], IN[0].yyyy, TEMP[0]
    2: MAD TEMP[1], CONST[4], IN[0].zzzz, TEMP[1]
    3: MAD OUT[0], CONST[5], IN[0].wwww, TEMP[1]
    4: MOV OUT[2], IMM[0].xyyy
    5: MOV OUT[3], IMM[0].zwww
    6: MOV TEMP[0], IMM[1].xyyy

OUT[2-7] is a "varying mat2x2[3] m;", OUT[4] is replaced
with
the
temp
in the instruction 6.

    7: MOV OUT[5], IMM[1].zwww
    8: MOV OUT[6], IMM[2].xyyy
    9: MOV OUT[7], IMM[2].zwww
   10: ARL ADDR[0].x, CONST[1].xxxx
   11: SNE TEMP[1], TEMP[ADDR[0].x].xyyy, CONST[0].xyyy

Instruction 11 contains the read with the relative
addressing
using
this temp, which is incorrect.

I'm not sure how to fix it,

The way to fix this is to allocate a consecutive range of
temps
at start, when there are indirect writes to output registers
and
at least one read.

but AFAICS at least for r600g this step
could be skipped completely - r600 can read outputs without
any
problem, they are located in the general-purpose registers.
Removing
calls to remove_output_reads and assert(src.File !=
TGSI_FILE_OUTPUT)
in the ureg_emit_src produces correct result and test passes
on
evergreen (total number of fixed tests is about 60).

Probably it makes sense to make this step optional and ask
the
driver
whether to use it, if I'm not missing something?

The drawback of doing this is that TGSI will look even more
different between drivers. This means that when somebody
makes a
change to the state tracker, and tested with one driver, it
may
break other drivers. This also means that comparing a driver
to
llvmpipe/softpipe will be less meaningfull, as different
paths
will be taken in the state tracker.


I think if it's needed to compare the drivers, than it's
possible
to
switch of the the cap for debugging. I see the drawbacks, but
this
is
also about performance. Currently with r600g (and possibly
with
other
drivers) we have to spend some time for unneccessary shader
modification, to get less efficient shader code as a result.

I am 100% sure nobody will turn on/off CAPs just to test
something.

I gotta agree with José on this one.

I guess I don't follow.  Different hardware can do different
things,
and
the code for that hardware will look different.  What's the
problem?
  It
seems silly to spend CPU time rearranging the code and then
hoping
the
driver will spend more CPU time to put it back the way it was.
  People
using these drivers *do* care about CPU performance, after all.
:)

Developer time is important too. And having more code paths shared
with other drivers (even at the expense of a few extra CPU cycles
every time a shader is created) means that developers has more time
to
focus on features that can yield substantial improvements on true
hotspots (e.g., every time a pixel is rendered).

This particular case may not be the best example. But there is a
trade
off: more specialization means more maintenance burden.

And merely sharing the C code is not enough: if one is not sharing
the
same code paths, then's not really sharing the code. It's looks
like
but it's not: it may be using an unique path to its driver, which
means that other developers may never test it so it can get easily
broken, and vice versa.

The gallium interface often makes such compromises.  And every now
and
then I see commits from Intel yanking some obscure optimization
code
path on i965 that's not worth trouble of maintaining, so I suppose
you
do to.

Again, I understand what you are talking about, but these patches are
adding about 5 lines of code, I defininitely can't imagine any
troubles
of maintaining them. I'm not trying to pull into state tracker huge
hardware-specific patch, e.g. implement 5-way vliw scheduling here,
or
some shader transformation because my hardware can't read something.
I'm
just trying to _avoid_ some hardware-specific code in the state
tracker,
which is created for another hardware and is just a trouble for for
my
hardware.

If everybody wants to make state tracker less hardware-specific, then
I
don't understand why this hardware-specific transformation code is in
the state tracker at all?

I'm not going to insist on my proposal, but if you're trying to
expalin
why these patches are evil, you probably need a better explanation.

I'm pointing out the drawbacks, but I never said these patches are evil.

I did not spend enough time to make my own mind about this patches, and 
whomever maintains r600g is free to decide either way as far as I'm concerned.  
From the beginning I wrote:

   "I'm not sure if for this particular issue a new cap is worth or not. Just 
pointing out that there are downsides of breaking orthogonality between state tracker and 
driver. It should not be done lightly."

I feel compelled to reply whenever somebody says "there's no loss with adding this cap 
here" with "yes there is, and here's why...".

But of course I would be both thumbs up if it was a patch that would fix the 
fundamental bug for all drivers.

It seems you are mixing two _different_ issues - the first is incorrect
handling of the relative addressing, and the second is unnecessary for
some drivers code which is spending CPU time for nothing, not to mention
additional bugs. My proposal was intended to fix the second, I'm not
proposing to never fix the first.

That's why I suggested removing reads from outputs at the GLSL IR level. This would take care of both problems at once. For drivers that can't handle reads from outputs, TGSI code would never be generated that contains such reads. For drivers that can handle it, there would be no buggy TGSI lowering pass to contend with.

Sorry if I misunderstood something, english is not my native language.

Vadim
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to