fs: Fetch one cacheline of pull constants at a time.

Kenneth Graunke Tue, 13 Dec 2016 01:22:45 -0800

On Friday, December 9, 2016 11:03:29 AM PST Francisco Jerez wrote:
> Asking the DC for less than one cacheline (4 owords) of data for
> uniform pull constants is suboptimal because the DC cannot request
> less than that from L3, resulting in wasted bandwidth and unnecessary
> message dispatch overhead, and exacerbating the IVB L3 serialization
> bug.  The following table summarizes the overall framerate improvement
> (with statistical significance of 5% and sample size ~10) from the
> whole series up to this patch for several benchmarks and hardware
> generations:
> 
>                          | SKL           | BDW          | HSW
> SynMark2 OglShMapPcf     | 24.63% ±0.45% | 4.01% ±0.70% | 10.31% ±0.38%
> GfxBench4 gl_manhattan31 |  5.93% ±0.35% | 3.92% ±0.31% |  6.62% ±0.22%
> GfxBench4 gl_4           |  2.52% ±0.44% | 1.23% ±0.10% |      N/A
> Unigine Valley           |  0.83% ±0.17% | 0.23% ±0.05% |  0.74% ±0.45%


I suspect OglShMapPcf gained SIMD16 on Skylake due to reduced register
pressure, from the lower message lengths on pull loads.  (At least, it
did when I had a series to fix that.)  That's probably a large portion
of the performance improvement here, and why it's so much larger for
that workload on Skylake specifically.  It might be worth mentioning it
in your commit message here.

Thanks for all your work on this.  I was originally concerned about the
Ivybridge bug, but given that we're loading one cacheline at a time, it
seems very unlikely that we'd ever load the same cacheline twice within
16 cycles.  We could if we have IF (non-uniform) load[foo] ELSE load[bar]
where foo and bar are indirect expressions that happen to be equal.  But
that seems quite uncommon.

Series is:
Reviewed-by: Kenneth Graunke <kenn...@whitecape.org>

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/9] i965/fs: Fetch one cacheline of pull constants at a time.

Reply via email to