If you do end up with a good repro for the performance difference between typed arrays and arrays, please post it at crbug.com/v8/new.
On Fri, Apr 6, 2018 at 8:33 AM, <mog...@syntheticsemantics.com> wrote: > Are you able to hoist the memory allocations out of the library so the > caller can allocate the buffers it needs and reuse them from call to call? > new in JS has GC overhead not present in C++ alloc/free. Aside from > those variables, the rest are stack allocations and GC won't play much role. > > -J > > > On Wednesday, April 4, 2018 at 7:03:05 PM UTC-7, J Decker wrote: >> >> How long of a story to tell(?)... >> >> I have this procedural random generator, that uses the result of sha2 as >> a stream of bits, regenerating another sha2 hash when the 256 bits are all >> consumed. I started to use this for perlin noise generation, and thought I >> found sha2 was the culprit consuming most of the time. Since I'm making >> something just generally random, I went looking for alternative, >> lightweight, RNGs. After digging for a while I stumbled on PCG ( >> http://www.pcg-random.org/using-pcg.html) It's basically a header-only >> library because it attempts to generate everything as inline... >> >> I made this JS port of it https://gist.github.com/d3 >> x0r/345b256be6569c0086c328a8d1b4be01 >> This is the first revision ... https://gist.github.com/d3 >> x0r/345b256be6569c0086c328a8d1b4be01/fffa8e906d5723e66f7e9ba >> a950b3b3d5b4895c7 >> It has a better flow matching what the C code does closer... the current >> version is fast generating 115k bits per millisecond (vs the 9.3k bpms of >> sha2); however, when compared to the C version, which generates 1.1Mbps >> it's a factor of 10 off... and the routine is generally only doing 64 bit >> integer math (though the test was compiled in 32 bit mode so it was really >> just 32 bit registers emulating 64 bit). >> >> if I just change the arrays created in getState() (first function), to >> Uint32Array() it runs MUCH slower... >> >> ---- >> As I write this I was updating some, and some of my numbers from before >> are a factor of 8 off because I was counting bytes not bits; except in the >> sha2, which is really slow... But I would like to take this opportunity to >> say... >> >> crypto.subtle.digest("SHA-256", buffer).then(hash=>hash ); >> >> is the same output type as my javascript version I'm using ( forked from >> a fork of forge library and consolidated to just the one return type...), >> but is another 10x slower than my javascript sha-256. >> >> I keep thinking 'Oh I'll just compile this and even use intel accelearted >> sha2msg1 and sha2msg2 instructions to make the C version 8x faster than it >> is in C straight, which itself was already faster than the JS version; and >> hook it into a ... ( oh wait I want to do this on a webpage! can't say Node >> addon there...). >> >> Well... back to optimizing. >> ---- >> >> I was working also on a simple test case to show where using a simple >> array vs a typed array causes a speed difference, but it's not immediately >> obvious what I'm doing that's causing it to deoptimize.... so I'll work on >> building that up until it breaks; or conversely strip the other until it >> speeds up.. >> >> https://github.com/d3x0r/-/blob/master/org.d3x0r.common/salt >> y_random_generator.js#L86 This is getting the bits from a typed array; >> and it's really not that complex (especially if only getting 1 bit at a >> time which is what I was last speed testing with; but turns out all the >> time is really here, swapping out sha2 for pcg(without typed arrays) >> dropped that from 150ms to 50ms but the remaining was still 3500ms... so I >> misread the initial performance graph I guess... >> >> There's a stack of what were C macros to make the whole thing more >> readable... https://github.com/d3x0r/-/blob/master/org.d3x0r >> .common/salty_random_generator.js#L25 and if I inline these, there's no >> improvement so I guess they're all small to qualify for auto inlining >> anyway. The version that's current on github ended up creating a new >> uint32array(1) for every result; I moved that out locally so I can use just >> a single buffer for that result and it sped up the initialization from >> 700ms to 200ms (cumulative times) but there's still like 80% of the time in >> the remainder of the getBuffer routine; maybe I need to move things out of >> the uint8arrays (data from sha2/pcg) >> >> >> -- -- v8-users mailing list v8-users@googlegroups.com http://groups.google.com/group/v8-users --- You received this message because you are subscribed to the Google Groups "v8-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.