Re: [v8-users] V8 optimizations causing spikes in execution times

Adam Damiano Fri, 18 Jan 2019 15:09:35 -0800

Testing - Google Groups keeps deleting my responses, so I don't want to 
retype it all over if it's just going to be deleted again.


On Friday, January 18, 2019 at 1:17:38 PM UTC-8, Peter Wong wrote:
>
> Just wanted to add that you can access Chrome DevTools with Node.js 
> "--inspect" option (depending on which version of Node).
> There's a Profiler tab instead of the Timeline tab, but it should get you 
> the same data (Flame chart, bottom up, top down breakdowns).
>
> On Friday, January 18, 2019 at 2:58:21 PM UTC-6, Jakob Kummerow wrote:
>>
>> Answering what I can:
>>
>> I changed my code from using a "while(true)" loop to calling 
>>> "setImmediate" every once in a while so that it would yield to the event 
>>> loop, that way the GC shouldn't interrupt a turn while it's executing. I 
>>> didn't notice a difference doing this.
>>
>>
>> Not seeing a difference there makes sense; the GC doesn't care about 
>> yielding to the event loop.
>>
>> apparently 137 ms were spent optimizing (20 ms to prepare, 115 ms to 
>>> execute, and 2 ms to finalize).
>>
>>
>> That means 20ms were spent on the main thread some time in the past, 115 
>> ms in the background, and then 2 ms on the main thread just now (so those 2 
>> ms would be part of the 37 ms of that turn). 
>>
>> Running that same test with "--minimal"
>>
>>
>> You mean V8's --minimal flag? That is absolutely not what you want. It's 
>> a development tool that turns off everything except the bare minimum: no 
>> more type feedback, no more optimizing compiler, >100x performance hit. 
>> (And FWIW, that flag has recently been removed.)
>>
>>  I tried "--predictable" without seeing any differences
>>
>>
>> Yes, --predictable is another development/debugging tool that is not 
>> useful for production purposes. It turns off all sources of randomness, 
>> which includes all threading. You would only see worse performance and 
>> longer main-thread stalls from this.
>>
>> Are there any other V8 flags that may be of help here?
>>
>>
>> The default configuration is the recommended configuration for best 
>> performance. 
>>
>> Is it possible to choose when to run optimizations?
>>
>>
>> Effectively no. (There is the special intrinsic 
>> %OptimizeFunctionOnNextCall, which does what the name suggests, but 
>> optimization is only useful if the right type feedback is available, so 
>> manually choosing when to optimize a given function only makes sense for 
>> tests where you know exactly which function you want to have optimized 
>> when; I can't imagine it being useful for your use case: especially for 
>> user-provided code, there is no way to tell when it is ready for 
>> optimization, and if you manually optimize it too early, it'll just get 
>> deoptimized right away, so the manually triggered optimization would be a 
>> waste of time and only introduce more/longer delays.)
>>
>> Based on the data you've provided, I'm not convinced that optimizations 
>> are the reason for those long turns, but I also don't have a good guess for 
>> what else it might be.
>>
>> If it were optimizations, then warmup would probably be the best 
>> mitigation, but you report that warmup doesn't help, so it's probably 
>> something else. The same reasoning speaks against the other 
>> frequently-given advice, which is to avoid deoptimizations at least in the 
>> code you control (while deopts themselves are fast, a deoptimized function 
>> might get optimized again later, and that might take time).
>>
>> One thing you could try is wrapping the whole thing into a website so you 
>> can use Chrome DevTools' profiling facilities. Maybe that'll make it easier 
>> to drill down into those rarely-occurring delays.
>>
>>
>> On Fri, Jan 18, 2019 at 12:19 PM Adam Damiano <onai...@gmail.com> wrote:
>>
>>> I've tried to be as descriptive as possible below without covering 
>>> irrelevant details. The summary of the problem is that I think V8 
>>> optimizations may be causing unpredictable processing spikes in my game 
>>> that I need to be able to either defer or measure.
>>>
>>> Background
>>>
>>>    - The code is closed-source, and there are a *lot* of moving parts, 
>>>    so I'm wondering mostly about theory here and not necessarily 
>>>    implementation. More on this in the "Questions" section at the bottom.
>>>    - I'm making a game where players write code for bots that battle 
>>>    each other. The game is turn-based, and each match can take thousands of 
>>>    turns. All of these turns get simulated at once in a NodeJS process 
>>>    (v10.15) running in a 1-vCPU Alpine container on AWS. After a simulation 
>>> is 
>>>    done, the battle is essentially replayed on the client without any input 
>>>    from the player.
>>>    - Each simulation is CPU-intensive given that it's just running 
>>>    player-defined code. There is no network or disk utilization during this 
>>>    time. The most important performance metrics for me are:
>>>       - How long (in milliseconds) does an average turn take to 
>>>       simulate?
>>>       - How long does the longest turn take to simulate?
>>>       - The average is important so that I know how many turns to limit 
>>>    matches to. The measured average over tens of thousands of real matches 
>>> was 
>>>    about 0.3 ms/turn, so I chose 3000 turns for the limit since 0.3 ms/turn 
>>> * 
>>>    3000 turns is still less than a second (which I consider to be a 
>>> reasonable 
>>>    amount of time for players to have to wait for a result).
>>>    - The longest turn is important so that I can figure out when to 
>>>    stop executing scripts that will probably never complete on their own. 
>>> For 
>>>    example, if someone writes an infinite loop, then I need to cut the 
>>>    execution off *somewhere*. I chose 50ms for this.
>>>    - Code is executed using "eval" (I know about the security risks and 
>>>    have hopefully mitigated any attacks, but I want this issue to stay 
>>> focused 
>>>    on performance rather than security). Here is a simplified version of 
>>> the 
>>>    steps that I run, but the summary is that I "eval" a user-defined 
>>> function 
>>>    once, then I can call that function multiple times per match:
>>>       - Give each bot a "namespace" for the user to put their code into:
>>>       global.bot1 = {};
>>>       global.bot2 = {};
>>>       ...
>>>       global.botN = {};
>>>       
>>>
>>>    - Get code from the user for a particular bot:
>>>       const exampleUserDefinedScript = `update = function() {
>>>           fireLasers();
>>>       };`;
>>>       
>>>
>>>    - Transform it slightly to be able to place it into one of the 
>>>       namespaces:
>>>       const exampleUserDefinedScript = `global.bot1.update = function() 
>>>       {
>>>           fireLasers();
>>>       };`;
>>>       
>>>
>>>    - Evaluate that to put it into my global scope:
>>>    
>>> eval(exampleUserDefinedScript);
>>>
>>>
>>>    - Simulate a match in a loop using those global functions:
>>>    
>>> while (true) {
>>>     const bot = findNextBotToUpdate();
>>>     global[bot].update(); // note: this is what is timed to give me my 
>>> performance metrics
>>>     if (gameEnded()) break;
>>> }
>>>
>>>
>>>    - Free references to the global functions:
>>>    
>>> global.bot1 = null;
>>> global.bot2 = null;
>>> // etc.
>>>
>>>
>>> Problem
>>> The problem I'm running into is that individual turns sometimes take 
>>> longer than 50ms. I investigated heavily (see next section) and I think 
>>> that it's due to V8 optimizing on the main thread.
>>>
>>> Investigation
>>>
>>>    - As mentioned, this is running in a container with no other 
>>>    processes, so the only threads that could be interrupting would be from 
>>> the 
>>>    OS or Node itself.
>>>    - The environment that I'm running in doesn't seem to matter too 
>>>    much. Running on AWS in a 1-vCPU container is my production environment, 
>>>    but even in development, I see turns that take a disproportionately long 
>>>    time.
>>>    - I ran the Node profiler 
>>>    <https://nodejs.org/en/docs/guides/simple-profiling/>, but nothing 
>>>    stood out to me. Then again, I could just be interpreting the results 
>>>    incorrectly. I get something like this:
>>>
>>>     Statistical profiling result from converted, (6468 ticks, 126 
>>> unaccounted, 0 excluded).
>>>     [Summary]:
>>>       ticks  total  nonlib   name
>>>       3184   49.2%   96.2%  JavaScript
>>>          0    0.0%    0.0%  C++
>>>        314    4.9%    9.5%  GC
>>>       3158   48.8%          Shared libraries
>>>        126    1.9%          Unaccounted
>>>
>>>
>>>    - When I run with "--trace-gc", I see that the garbage collector is 
>>>    mostly running scavenges that take 0.0 ms. Regardless, I changed my code 
>>>    from using a "while(true)" loop to calling "setImmediate" every once in 
>>> a 
>>>    while so that it would yield to the event loop, that way the GC 
>>> shouldn't 
>>>    interrupt a turn while it's executing. I didn't notice a difference 
>>> doing 
>>>    this.
>>>    - Running with --trace-deopt doesn't show any deoptimizations taking 
>>>    a long time.
>>>    - Running with --trace-opt shows some optimizations taking a long 
>>>    time, but I don't fully understand the results. For example, on one turn 
>>>    that took 37.33 ms (which was 9 times the average for that match), 1 ms 
>>> was 
>>>    spent deoptimizating, and apparently 137 ms were spent optimizing (20 ms 
>>> to 
>>>    prepare, 115 ms to execute, and 2 ms to finalize).
>>>    - I ended up working perfhooks.performance.now() calls all over my 
>>>    code: one set was placed around all of the code that I execute on behalf 
>>> of 
>>>    the user, and one set was placed into individual function calls that are 
>>>    called by the user. What I saw was that the time spent in individual 
>>>    function calls didn't add up to the total amount of time that I 
>>> measured. 
>>>    To me, this indicated that the time spent was not a direct result of my 
>>>    JavaScript code, but rather an indirect result (e.g. optimization).
>>>    - Running that same test with "--minimal" would not reproduce those 
>>>    results. Instead, the sum of the individual times would indeed add up to 
>>>    the total time measured.
>>>    - I tried "warming up" the optimizations by running ~200 random 
>>>    simulated matches at startup time, but I still ran into long-running 
>>> turns. 
>>>    This might have been more effective if instead of randomly simulating, 
>>> I'd 
>>>    exhaustively simulated such that every line of code was being hit.
>>>    
>>>
>>> Questions
>>>
>>>    - Is it possible to choose when to run optimizations? For example, I 
>>>    could have them run in between turns, that way I don't count it against 
>>> a 
>>>    player when optimizations take a while. I realize that it would mean 
>>> that 
>>>    I'd be running unoptimized code for a turn.
>>>    - Is it possible to tell exactly how long was spent optimizing from 
>>>    within JavaScript? E.g.
>>>
>>> const startTime = Date.now();
>>> runSomeCode();
>>> const elapsedTime = Date.now() - startTime;
>>> const timeSpentOptimizing = aV8FunctionThatMayOrMayNotExist(); // can I 
>>> do something for this?
>>> const actualElapsedTime = elapsedTime - timeSpentOptimizing;
>>>
>>>
>>>    - Are there any other V8 flags that may be of help here? I tried 
>>>    "--predictable" without seeing any differences, but maybe there's 
>>> something 
>>>    I'm missing.
>>>    
>>>
>>> Thanks for reading through this large blob of text!
>>> -Adam
>>>
>>> -- 
>>> -- 
>>> v8-users mailing list
>>> v8-u...@googlegroups.com
>>> http://groups.google.com/group/v8-users
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "v8-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to v8-users+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [v8-users] V8 optimizations causing spikes in execution times

Reply via email to