On 3/18/24 21:41, Jeff Law wrote:
>> The first patch is the main change which improves SPEC cactu by 10%.
> Just to confirm. Yup, 10% reduction in icounts and about a 3.5%
> improvement in cycles on our target. Which is great!
Nice.
> This also makes me wonder if cactu is the benchmark that was sensitive
> to flushing the pending queue in the scheduler. Jivan's data would tend
> to indicate that is the case as several routines seem to flush the
> pending queue often. In particular:
>
> ML_BSSN_RHS_Body
> ML_BSSN_Advect_Body
> ML_BSSN_constraints_Body
>
> All have a high number of dynamic instructions as well as lots of
> flushes of the pending queue.
>
> Vineet, you might want to look and see if cranking up the
> max-pending-list-length parameter helps drive down spilling. I think
> it's default value is 32 insns. I've seen it cranked up to 128 and 256
> insns without significant ill effects on compile time.
>
> My recollection (it's been like 3 years) of the key loop was that it had
> a few hundred instructions and we'd flush the pending list about 50
> cycles into the loop as there just wasn't enough issue bandwidth to the
> FP units to dispatch all the FP instructions as their inputs became
> ready. So you'd be looking for flushes in a big loop.
Great insight.
Fired off a cactu run with 128, will keep you posted.
Thx,
-Vineet