If you do end up with a good repro for the performance difference between
typed arrays and arrays, please post it at crbug.com/v8/new.

On Fri, Apr 6, 2018 at 8:33 AM, <mog...@syntheticsemantics.com> wrote:

> Are you able to hoist the memory allocations out of the library so the
> caller can allocate the buffers it needs and reuse them from call to call?
> new in JS has GC overhead not present in C++ alloc/free.  Aside from
> those variables, the rest are stack allocations and GC won't play much role.
>
>                      -J
>
>
> On Wednesday, April 4, 2018 at 7:03:05 PM UTC-7, J Decker wrote:
>>
>> How long of a story to tell(?)...
>>
>> I have this procedural random generator, that uses the result of sha2 as
>> a stream of bits, regenerating another sha2 hash when the 256 bits are all
>> consumed.  I started to use this for perlin noise generation, and thought I
>> found sha2 was the culprit consuming most of the time.  Since I'm making
>> something just generally random, I went looking for alternative,
>> lightweight, RNGs. After digging for a while I stumbled on PCG (
>> http://www.pcg-random.org/using-pcg.html) It's basically a header-only
>> library because it attempts to generate everything as inline...
>>
>> I made this JS port of it  https://gist.github.com/d3
>> x0r/345b256be6569c0086c328a8d1b4be01
>> This is the first revision ... https://gist.github.com/d3
>> x0r/345b256be6569c0086c328a8d1b4be01/fffa8e906d5723e66f7e9ba
>> a950b3b3d5b4895c7
>> It has a better flow matching what the C code does closer... the current
>> version is fast generating 115k bits per millisecond (vs the 9.3k bpms of
>> sha2); however, when compared to the C version, which generates 1.1Mbps
>> it's a factor of 10 off... and the routine is generally only doing 64 bit
>> integer math (though the test was compiled in 32 bit mode so it was really
>> just 32 bit registers emulating 64 bit).
>>
>> if I just change the arrays created in getState() (first function), to
>> Uint32Array() it runs MUCH slower...
>>
>> ----
>> As I write this I was updating some, and some of my numbers from before
>> are a factor of 8 off because I was counting bytes not bits; except in the
>> sha2, which is really slow... But I would like to take this opportunity to
>> say...
>>
>>     crypto.subtle.digest("SHA-256", buffer).then(hash=>hash );
>>
>> is the same output type as my javascript version I'm using ( forked from
>> a fork of forge library and consolidated to just the one return type...),
>> but is another 10x slower than my javascript sha-256.
>>
>> I keep thinking 'Oh I'll just compile this and even use intel accelearted
>> sha2msg1 and sha2msg2 instructions to make the C version 8x faster than it
>> is in C straight, which itself was already faster than the JS version; and
>> hook it into a ... ( oh wait I want to do this on a webpage! can't say Node
>> addon there...).
>>
>> Well... back to optimizing.
>> ----
>>
>> I was working also on a simple test case to show where using a simple
>> array vs a typed array causes a speed difference, but it's not immediately
>> obvious what I'm doing that's causing it to deoptimize.... so I'll work on
>> building that up until it breaks; or conversely strip the other until it
>> speeds up..
>>
>> https://github.com/d3x0r/-/blob/master/org.d3x0r.common/salt
>> y_random_generator.js#L86  This is getting the bits from a typed array;
>> and it's really not that complex (especially if only getting 1 bit at a
>> time which is what I was last speed testing with; but turns out all the
>> time is really here, swapping out sha2 for pcg(without typed arrays)
>> dropped that from 150ms to 50ms but the remaining was still 3500ms... so I
>> misread the initial performance graph I guess...
>>
>> There's a stack of what were C macros to make the whole thing more
>> readable... https://github.com/d3x0r/-/blob/master/org.d3x0r
>> .common/salty_random_generator.js#L25 and if I inline these, there's no
>> improvement so I guess they're all small to qualify for auto inlining
>> anyway.  The version that's current on github ended up creating a new
>> uint32array(1) for every result; I moved that out locally so I can use just
>> a single buffer for that result and it sped up the initialization from
>> 700ms to 200ms (cumulative times) but there's still like 80% of the time in
>> the remainder of the getBuffer routine; maybe I need to move things out of
>> the uint8arrays (data from sha2/pcg)
>>
>>
>>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to