Re: rand48 replacement

Yura Sokolov Tue, 06 Jul 2021 20:01:05 -0700

Fabien COELHO писал 2021-07-06 23:49:

Hello Yura,

However, I'm not enthousiastic at combining two methods depending on
the range, the function looks complex enough without that, so I would
suggest not to take this option. Also, the decision process adds to
the average cost, which is undesirable.


Given 99.99% cases will be in the likely case, branch predictor should
eliminate decision cost.


Hmmm. ISTM that a branch predictor should predict that unknown < small
should probably be false, so a hint should be given that it is really
true.


Why? Branch predictor is history based: if it were usually true here
then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.

I put `likely` for compiler: compiler then puts `likely` path closer.

And as Dean Rasheed correctly mentioned, mask method will have reallybad pattern for branch predictor if range is not just below or equalto power of two.
On average the bitmask is the better unbiased method, if the online
figures are to be trusted. Also, as already said, I do not really want
to add code complexity, especially to get lower average performance,
and especially with code like "threshold = - range % range", where
both variables are unsigned, I have a headache just looking at it:-)


If you mention https://www.pcg-random.org/posts/bounded-rands.html then
1. first graphs are made with not exact Lemire's code.

Last code fromhttps://lemire.me/blog/2016/06/30/fast-random-shuffling/(which I derived from) performs modulo operation only if `(leftover <range)`.Even with `rand_range(1000000)` it is just once in four thousandsruns.

2. You can see "Small-Constant Benchmark" at that page, Debiased Int is

1.5 times faster. And even in "Small-Shuffle" benchmark theirunoptimized

  version is on-par with mask method.
3. If you go to "Making Improvements/Faster Threshold-Based Discarding"
  section, then you'll see code my version is matched with. It is twice
  faster than masked method in Small-Shuffle benchmark, and just a bit
  slower in Large-Shuffle.

And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
on modern processor.


Well, % is not cheap either.

Well, probably it could run in parallel with some part of xoroshiro,but it depends on how compiler will optimize this function.
I would certainly select the unbias multiply method if we want a u32range variant.
There could be two functions.


Yep, but do we need them? Who is likely to want 32 bits pseudo random
ints in a range? pgbench needs 64 bits.

So I'm still inclined to just keep the faster-on-average bitmask
method, despite that it may be slower for some ranges. The average
cost for the worst case in PRNG calls is, if I'm not mistaken:

  1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2

which does not look too bad.


You doesn't count cost of branch-misprediction.
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/
Therefore calculation should be at least:

   1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6

By the way, we have 64bit random. If we use 44bit from it for range <=(1<<20), thenbias will be less than 1/(2**24). Could we just ignore it (given it isnot crypto

strong random)?

uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
  uint64 val = xoroshiro128ss(state);
  uint64 m;
  if ((range & (range-1) == 0) /* handle all power 2 cases */
    return range != 0 ? val & (range-1) : 0;
  if (likely(range < (1<<20)))
     /*

* While multiply method is biased, bias will be smaller than1/(1<<24) for

      * such small ranges. Lets ignore it.
      */
     return ((val >> 20) * range) >> 44;
  /* Apple's mask method */
  m = mask_u64(range-1);
  val &= m;
  while (val >= range)
    val = xoroshiro128ss(state) & m;
  return val;
}

Anyway, excuse me for heating this discussion cause of suchnon-essential issue.

I'll try to control myself and don't proceed it further.

regards,
Sokolov Yura.

Re: rand48 replacement

Reply via email to