On Tue, Mar 15, 2016 at 12:10 PM, Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Mon, Mar 14, 2016 at 08:42:32PM -0400, Ganesh Ajjanagadde wrote: >> On Sun, Mar 13, 2016 at 11:08 PM, Michael Niedermayer >> <mich...@niedermayer.cc> wrote: >> > On Sun, Mar 13, 2016 at 07:12:50PM -0400, Ganesh Ajjanagadde wrote: >> >> This is based on the relatively well known xorshift128+ of Sebastiano >> >> Vigna (https://en.wikipedia.org/wiki/Xorshift) that performs very well on >> >> the >> >> BigCrush suite, is very efficient, and is thus used by a number of >> >> clients: http://xorshift.di.unimi.it/ (see Introduction). >> >> >> >> Moreover, the implementation is in the public domain: >> >> http://xorshift.di.unimi.it/xorshift128plus.c. >> >> >> >> Concretely, it is nearly as fast as av_lfg_get (which only returns 32 >> >> bits), >> >> and has a much smaller cache (128 bits). Thus, the timings should also >> >> be more stable. >> >> >> >> This is needed because av_lfg_get<<32 | av_lfg_get is far slower, and >> >> likely less random as measured by BigCrush - most LFG's perform >> >> quite poorly on the BigCrush suite: >> >> http://www6.tw.freebsd.org/distfiles/testu01.pdf. >> >> In particular, FFmpeg's seems to be Ran055 in the paper, see pg31. >> >> >> >> Sample benchmark (Haswell, GCC + -march=native): >> >> 23200 decicycles in 624 calls of av_lfg_get, 1 runs, 0 skips >> >> 23040 decicycles in 624 calls of av_lfg_get, 2 runs, 0 skips >> >> 22810 decicycles in 624 calls of av_lfg_get, 4 runs, 0 skips >> >> [...] >> >> 19884 decicycles in 624 calls of av_lfg_get, 65532 runs, 4 skips >> >> 19880 decicycles in 624 calls of av_lfg_get, 131067 runs, 5 skips >> >> 19884 decicycles in 624 calls of av_lfg_get, 262136 runs, 8 skips >> >> 19877 decicycles in 624 calls of av_lfg_get, 524276 runs, 12 skips >> >> >> >> 25380 decicycles in 624 calls of av_rand64_get, 1 runs, 0 >> >> skips >> >> 28560 decicycles in 624 calls of av_rand64_get, 2 runs, 0 >> >> skips >> >> 30112 decicycles in 624 calls of av_rand64_get, 4 runs, 0 >> >> skips >> >> [...] >> >> 22627 decicycles in 624 calls of av_rand64_get, 65536 runs, 0 >> >> skips >> >> 22625 decicycles in 624 calls of av_rand64_get, 131072 runs, 0 >> >> skips >> >> 22625 decicycles in 624 calls of av_rand64_get, 262143 runs, 1 >> >> skips >> >> 22624 decicycles in 624 calls of av_rand64_get, 524286 runs, 2 >> >> skips >> >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajja...@gmail.com> >> >> --- >> >> libavutil/lfg.c | 33 ++++++++++++++++++++++++++++++++- >> >> libavutil/lfg.h | 26 ++++++++++++++++++++++++++ >> >> 2 files changed, 58 insertions(+), 1 deletion(-) >> > >> > why do you add this code to lfg.* (Lagged Fibonacci Generator)? >> > its not a lfg >> >> I just wanted to lay out the implementation, it was easier for me to >> test and compare that way. Also, don't really know if it should be >> public or private. I really felt that all random number stuff should >> go into e.g lavu/rand.h, right now it is quite scattered across >> random_seed.c, lfg.h (which is not great from a discoverability >> standpoint). >> >> Anyway, if you are flexible about where it could go, I will resubmit >> as lavu/rand.h, lavu/rand.c containing this 64 bit gen. Is this fine >> with you? >> >> Some day or the other this can be cleaned up into e.g a single header, >> but that maybe best handled later. >> >> > >> > also the LFG could be trivially extended/changed to 64bit if one wants >> > only needs uint64_t being used >> >> I can believe the extendability, but note that I do not know about the >> quality of such generators. The linked TestU01 paper did not seem too >> happy with the additive lagged Fibonacci generators, and instead >> favored multiplicative ones like av_mlfg_get. >> >> > >> > also theres av_mlfg_get() which passes all tests, though slower of >> > course >> >> Are you sure it passes all of BigCrush? What test suite are you referring to? > > http://www6.tw.freebsd.org/distfiles/testu01.pdf > it lists several multiplicative LFGs they all pass all tests > See "Table I. Results of test batteries applied to well-known RNGs" > unless i misunderstand > > of course all generators of this class are expected to have defects > in the least significant bits if one searches hard enough. > I assumed that these tests dont search hard enough > also every PRNG fails a test otherwise it wouldnt be a PRNG but a > RNG ... > > that said, i would be somewhat surprised if the more significant side > of the mlfg values fail many tests that arent designed specifically > for mlfgs > > >> >> > >> > and does xorshift128+ really pass all tests ? >> >> This was a claim on wikipedia that is I believe incorrect, > > wikipedia ... sigh > > >> Vigna >> himself makes no such absolute claims. All he claims is (from >> http://xorshift.di.unimi.it/xorshift128plus.c): >> " >> >> /* This is the fastest generator passing BigCrush without >> systematic failures, but due to the relatively short period it is >> acceptable only for applications with a mild amount of parallelism; >> otherwise, use a xorshift1024* generator. */ >> " >> As such, he does not rule out all possible failures. >> In fact, his page http://xorshift.di.unimi.it/#quality suggest certain >> failures. >> >> I therefore did not mention explicit absolute claims in the message either. > > I would have been very surprised if this basic generator didnt have > defects in stronger tests. > the LSBs should fail basic linearity tests > > IIUC/IIRC marsaglia suggested to use the xorshift with a multipliction > as non linear step. If thats done id have little doubt in the generator > but i didnt read the xorshift papers so i might be half wrong on some > things ... > > with just an addition as non linear step i have a odd feeling. > This is all no doubt purely academic and xorshift128+ is likely > more then good enough for us. It just feels a bit "weak" to have a > purely linear generator that is scrabled by a simple addition. > also iam thinking a bit here in relation to a LFG which too uses > just an addition and is weak too. > > i think what iam trying to say is that xorshift128plus isnt faster > than the LFG and while its maybe better its not much better
It is definitely better than the current av_lfg_get() << 32 | av_lfg_get in the context of Gaussian number generation. I noticed this while running the Anderson Darling tests (patch 3). Of course this does not mean a whole lot, ultimately there will always exist some number of samples for which a Gaussian test will fail. Nevertheless, if we want good, fast Gaussian samples, I would like this in. But see also the thread, atomnuker has other ideas, Derek is not too happy, etc. > > about the performance comparission. > LFGs should be optimizeable in SIMD, is that worth it i dont know Doubt it, if one needs that level of performance, I doubt once is terribly interested in quality at that point and can simply use the suggested fast alternatives. > > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Breaking DRM is a little like attempting to break through a door even > though the window is wide open and the only thing in the house is a bunch > of things you dont want and which you would get tomorrow for free anyway > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel