Hi Andre! Wow, that's a bit much to read right now. The problem I have with using AVX2 would be portability, which would be no issue if we wrapped RNG in a VOLK kernel and offered a good baseline competitor in portable C code. Point is though that our alternative in RNG seems to be boost::mt19937, which we could only "extern C{...}" into VOLK. I don't like that architecturally, but I think it's definitely something I'd like to hear Nathan's input on.
What I do like, however is the takeaway that, "we can safely ignore the risk of overlapping subsequences" for two parallelly running mt19937 sequences that were seeded differently -- which means that we can well scale random number generation to multiple cores, and especially lets us just divide each output buffer into n_threads subbuffers and let the different generators run parallely on that, without wrecking memory coherency bus bandwidth too much. I guess that we'd ideally just spawn as many threads as there are 4kB-pages to fill with random samples (meaning, at every 512 gr_complexes), just a wild guess. Cheers, Marcus On 02.09.2015 16:35, Andre Puschmann wrote: > Hi guys, > > a few days ago I stumbled over an RNG implementation [2] that uses some > of the newer AVX2 instructions. I guess this would improve performance > radically. There is also a paper on the matter [1]. > > Cheers > Andre > > > [1] http://agner.org/random/theory/randomvector.pdf > [2] http://agner.org/random/ > > > On 02.09.2015 15:28, Marcus Müller wrote: >> Hi Stefan, >> >> strange, I was looking at the same code just a few days ago, when I >> needed to find out whether I understood filters well enough by pushing >> noise through. Here's my small testcases (thankfully I didn't restart my >> machine since then -- these were still in /tmp). >> >> So, in fact, I tested the performance of boost's mt19937 + boost's >> normal distribution variate, and basically, it's relatively slow, >> indeed; generating 1e8 complex random numbers (meaning 200e8 random >> floats) with a single thread takes >> >> g++ (GCC) 5.1.1 20150618 (x86_64) >> test_rnd (Boost) >> g++ -OXXX test_rnd.cpp >> test_gr (gr::random) >> g++ $(pkg-config --cflags --libs gnuradio-runtime) -OXXX te_gr.cpp >> -O3 (optimized for speed) >> 1.25 s >> 5.92 s >> -O0 (explicitly unoptimized) (default) >> 23.62 s >> 7.13 s >> >> >> Someone who cares for speed could get a 5x increase of speed by using >> boost, because that person would be using optimization, anyways. >> However, I can't really explain what happens with boost and the >> unoptimized case; it's really three times as bad as the current >> implementation. >> However, asking perf about this, the -O3 version spends nearly all of >> its time in main(), whereas the -O0 spends most of it in some boost >> functions -- which means that -O3 definitely inlines the calculations, >> and from the disassembly alone I'd say all the stack operations needed >> to jump in and out of routines seem to contribute significantly to the >> problem. >> >> Of course, that's not really a benchmark for all systems. What about 32 >> bit? What about ARM? What about clang? Can anyone try to make a Windows >> build? >> >> Best regards, >> Marcus >> >> On 02.09.2015 14:10, Stefan Wunsch wrote: >>> Hi! >>> >>> I have discovered that the implemented random number generator in >>> gnuradio (see file [0]) is almost older than me. As written in the code, >>> the implementation is taken from 'Numerical recipes in C' (see version >>> from 1992). The problem is that this algorithm is really bad compared to >>> current algorithms. E.g. the period length is 1e8 compared to an >>> up-to-date algorithm (Mersenne twister) with a period length of about >>> 1e6000. >>> >>> I have fixed this [1] using boost.random and the mentioned Mersenne >>> twister. Furthermore I have written some test-cases and fixed the >>> transformation to Laplacian random numbers (the current implementation >>> is wrong). As well, I have added the random class to swig so that you >>> can use this in python. >>> >>> Now my question: Before doing a pull request, do you have any concerns >>> regarding memory use or processing load? Obviously the new >>> implementation isn't that light-weight as the ten lines of code before. >>> But the current implementation can not be used in any serious simulation >>> or publication, which is highly dependent on good random numbers. Some >>> information about the performance is given on this page: [2]. Look for >>> the generator mt19937 in table 24.5. >>> >>> Best regards >>> Stefan >>> >>> [0] gnuradio/gnuradio-runtime/lib/math/random.cc >>> [1] https://github.com/stwunsch/gnuradio/tree/newRandom >>> [2] >>> http://www.boost.org/doc/libs/1_59_0/doc/html/boost_random/reference.html >>> >>> _______________________________________________ >>> Discuss-gnuradio mailing list >>> Discuss-gnuradio@gnu.org >>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >> >> >> _______________________________________________ >> Discuss-gnuradio mailing list >> Discuss-gnuradio@gnu.org >> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >> > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio