Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

Nuno Diegues Thu, 09 Apr 2015 03:42:58 -0700

On Wed, Apr 8, 2015 at 6:54 PM, Andi Kleen <a...@firstfloor.org> wrote:
>> On the STAMP suite of benchmarks for transactional memory (described here 
>> [1]).
>> I have ran an unmodified GCC 5.0.0 against the patched GCC with these
>> modifications and obtain the following speedups in STAMP with 4
>> threads (on a Haswell with 4 cores, average 10 runs):
>
> I expect you'll need different tunings on larger systems.


I did not quite understand the extent of your comment: what
specifically would need different tuning? The idea is exactly that
this proposal does not have any attachment to the workload/deployment;
there are some parameters (aka, the magic numbers we discussed) but
they are quite reasonable, i.e., each one of them has a sensible value
with some meaning we understand.


>
>> That is a good point. While I haven't ever used fixed point
>> arithmetic, a cursory inspection reveals that it does make sense and
>> seems applicable to this case.
>> Are you aware of some place where this is being done already within
>> GCC that I could use as inspiration, or should I craft some macros
>> from scratch for this?
>
> I believe the inliner uses fixed point. Own macros should be fine too.

Thanks, will try this out.


>
>> > > +  int32_t last_attempts = optimizer.last_attempts;
>> > > +  int32_t current_attempts = optimizer.optimized_attempts;
>> > > +  int32_t new_attempts = current_attempts;
>> > > +  if (unlikely(change_for_worse > 1.40))
>> > > +    {
>> > > +      optimizer.optimized_attempts = optimizer.best_ever_attempts;
>> > > +      optimizer.last_throughput = current_throughput;
>> > > +      optimizer.last_attempts = current_attempts;
>> > > +      return;
>> > > +    }
>> > > +
>> > > +  if (unlikely(random() % 100 < 1))
>> > > +    {
>> >
>> > So where is the seed for that random stored? Could you corrupt some
>> > user's random state? Is the state per thread or global?
>> > If it's per thread how do you initialize so that they threads do
>> > start with different seeds.
>> > If it's global what synchronizes it?
>>
>> As I do not specify any seed, I was under the impression that there
>> would be a default initialization. Furthermore, the posix
>> documentation specifies random() to be MT-safe, so I assumed its
>> internal state to be per-thread.
>> Did I mis-interpret this?
>
> Yes, that's right. But it's very nasty to change the users RNG state.
> A common pattern for repeatable benchmarks is to start with srand(1)
> and then use the random numbers to run the benchmark, so it always does
> the same thing. If you non deterministically (transaction aborts are not
> deterministic) change the random state it will make the benchmark not
> repeatable anymore.  You'll need to use an own RNG state that it independent.

I understand your concern, thanks for raising it.

One general question on how to proceed:
given that I make some further changes, should I post the whole patch again?


Best regards,
-- Nuno Diegues


>
> It would be good to see if any parts of the algorithm can be
> simplified. In general in production software the goal is to have
> the simplest algorithm that does the job.
>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only.

Re: [PATCH] add self-tuning to x86 hardware fast path in libitm

Reply via email to