This patch adds an vectorized implementation of the mersenne twister random 
number generator. This implementation is approximately 2.6 times faster than 
the non-vectorized implementation.

This implementation includes "arm_neon.h" when including the optimized 
<ext/random>.  This has the effect of polluting the global namespace with the 
Neon intrinsics, so user macros and functions could potentially clash with 
them.  Is this acceptable given this only happens when <ext/random> is 
explicitly included? Comments and input are welcome.

Sample code to use the new generator would look like this:

#include <random>
#include <ext/random>
#include <iostream>

int
main()
{
  __gnu_cxx::sfmt19937 mt(1729);

  std::uniform_int_distribution<int> dist(0,1008);

  for (int i = 0; i < 16; ++i)
    {
      std::cout << dist(mt) << " ";
    }
}



2017-06-01  Michael Collison  <michael.colli...@arm.com>

        Add optimized implementation of mersenne twister for aarch64
        * config/cpu/aarch64/opt/ext/opt_random.h: New file.
        (__arch64_recursion): new function.
        (operator==): New function.
        (simd_fast_mersenne_twister_engine): New template class.
        * config/cpu/aarch64/opt/bits/opt_random.h: New file.
        * include/ext/random (add include for arm_neon.h):
        (simd_fast_mersenne_twister_engine): add _M_state private
        array for ARM_NEON conditional compilation.

Attachment: gnutools-4218-v10.patch
Description: gnutools-4218-v10.patch

Reply via email to