Testing - Google Groups keeps deleting my responses, so I don't want to retype it all over if it's just going to be deleted again.
On Friday, January 18, 2019 at 1:17:38 PM UTC-8, Peter Wong wrote: > > Just wanted to add that you can access Chrome DevTools with Node.js > "--inspect" option (depending on which version of Node). > There's a Profiler tab instead of the Timeline tab, but it should get you > the same data (Flame chart, bottom up, top down breakdowns). > > On Friday, January 18, 2019 at 2:58:21 PM UTC-6, Jakob Kummerow wrote: >> >> Answering what I can: >> >> I changed my code from using a "while(true)" loop to calling >>> "setImmediate" every once in a while so that it would yield to the event >>> loop, that way the GC shouldn't interrupt a turn while it's executing. I >>> didn't notice a difference doing this. >> >> >> Not seeing a difference there makes sense; the GC doesn't care about >> yielding to the event loop. >> >> apparently 137 ms were spent optimizing (20 ms to prepare, 115 ms to >>> execute, and 2 ms to finalize). >> >> >> That means 20ms were spent on the main thread some time in the past, 115 >> ms in the background, and then 2 ms on the main thread just now (so those 2 >> ms would be part of the 37 ms of that turn). >> >> Running that same test with "--minimal" >> >> >> You mean V8's --minimal flag? That is absolutely not what you want. It's >> a development tool that turns off everything except the bare minimum: no >> more type feedback, no more optimizing compiler, >100x performance hit. >> (And FWIW, that flag has recently been removed.) >> >> I tried "--predictable" without seeing any differences >> >> >> Yes, --predictable is another development/debugging tool that is not >> useful for production purposes. It turns off all sources of randomness, >> which includes all threading. You would only see worse performance and >> longer main-thread stalls from this. >> >> Are there any other V8 flags that may be of help here? >> >> >> The default configuration is the recommended configuration for best >> performance. >> >> Is it possible to choose when to run optimizations? >> >> >> Effectively no. (There is the special intrinsic >> %OptimizeFunctionOnNextCall, which does what the name suggests, but >> optimization is only useful if the right type feedback is available, so >> manually choosing when to optimize a given function only makes sense for >> tests where you know exactly which function you want to have optimized >> when; I can't imagine it being useful for your use case: especially for >> user-provided code, there is no way to tell when it is ready for >> optimization, and if you manually optimize it too early, it'll just get >> deoptimized right away, so the manually triggered optimization would be a >> waste of time and only introduce more/longer delays.) >> >> Based on the data you've provided, I'm not convinced that optimizations >> are the reason for those long turns, but I also don't have a good guess for >> what else it might be. >> >> If it were optimizations, then warmup would probably be the best >> mitigation, but you report that warmup doesn't help, so it's probably >> something else. The same reasoning speaks against the other >> frequently-given advice, which is to avoid deoptimizations at least in the >> code you control (while deopts themselves are fast, a deoptimized function >> might get optimized again later, and that might take time). >> >> One thing you could try is wrapping the whole thing into a website so you >> can use Chrome DevTools' profiling facilities. Maybe that'll make it easier >> to drill down into those rarely-occurring delays. >> >> >> On Fri, Jan 18, 2019 at 12:19 PM Adam Damiano <onai...@gmail.com> wrote: >> >>> I've tried to be as descriptive as possible below without covering >>> irrelevant details. The summary of the problem is that I think V8 >>> optimizations may be causing unpredictable processing spikes in my game >>> that I need to be able to either defer or measure. >>> >>> Background >>> >>> - The code is closed-source, and there are a *lot* of moving parts, >>> so I'm wondering mostly about theory here and not necessarily >>> implementation. More on this in the "Questions" section at the bottom. >>> - I'm making a game where players write code for bots that battle >>> each other. The game is turn-based, and each match can take thousands of >>> turns. All of these turns get simulated at once in a NodeJS process >>> (v10.15) running in a 1-vCPU Alpine container on AWS. After a simulation >>> is >>> done, the battle is essentially replayed on the client without any input >>> from the player. >>> - Each simulation is CPU-intensive given that it's just running >>> player-defined code. There is no network or disk utilization during this >>> time. The most important performance metrics for me are: >>> - How long (in milliseconds) does an average turn take to >>> simulate? >>> - How long does the longest turn take to simulate? >>> - The average is important so that I know how many turns to limit >>> matches to. The measured average over tens of thousands of real matches >>> was >>> about 0.3 ms/turn, so I chose 3000 turns for the limit since 0.3 ms/turn >>> * >>> 3000 turns is still less than a second (which I consider to be a >>> reasonable >>> amount of time for players to have to wait for a result). >>> - The longest turn is important so that I can figure out when to >>> stop executing scripts that will probably never complete on their own. >>> For >>> example, if someone writes an infinite loop, then I need to cut the >>> execution off *somewhere*. I chose 50ms for this. >>> - Code is executed using "eval" (I know about the security risks and >>> have hopefully mitigated any attacks, but I want this issue to stay >>> focused >>> on performance rather than security). Here is a simplified version of >>> the >>> steps that I run, but the summary is that I "eval" a user-defined >>> function >>> once, then I can call that function multiple times per match: >>> - Give each bot a "namespace" for the user to put their code into: >>> global.bot1 = {}; >>> global.bot2 = {}; >>> ... >>> global.botN = {}; >>> >>> >>> - Get code from the user for a particular bot: >>> const exampleUserDefinedScript = `update = function() { >>> fireLasers(); >>> };`; >>> >>> >>> - Transform it slightly to be able to place it into one of the >>> namespaces: >>> const exampleUserDefinedScript = `global.bot1.update = function() >>> { >>> fireLasers(); >>> };`; >>> >>> >>> - Evaluate that to put it into my global scope: >>> >>> eval(exampleUserDefinedScript); >>> >>> >>> - Simulate a match in a loop using those global functions: >>> >>> while (true) { >>> const bot = findNextBotToUpdate(); >>> global[bot].update(); // note: this is what is timed to give me my >>> performance metrics >>> if (gameEnded()) break; >>> } >>> >>> >>> - Free references to the global functions: >>> >>> global.bot1 = null; >>> global.bot2 = null; >>> // etc. >>> >>> >>> Problem >>> The problem I'm running into is that individual turns sometimes take >>> longer than 50ms. I investigated heavily (see next section) and I think >>> that it's due to V8 optimizing on the main thread. >>> >>> Investigation >>> >>> - As mentioned, this is running in a container with no other >>> processes, so the only threads that could be interrupting would be from >>> the >>> OS or Node itself. >>> - The environment that I'm running in doesn't seem to matter too >>> much. Running on AWS in a 1-vCPU container is my production environment, >>> but even in development, I see turns that take a disproportionately long >>> time. >>> - I ran the Node profiler >>> <https://nodejs.org/en/docs/guides/simple-profiling/>, but nothing >>> stood out to me. Then again, I could just be interpreting the results >>> incorrectly. I get something like this: >>> >>> Statistical profiling result from converted, (6468 ticks, 126 >>> unaccounted, 0 excluded). >>> [Summary]: >>> ticks total nonlib name >>> 3184 49.2% 96.2% JavaScript >>> 0 0.0% 0.0% C++ >>> 314 4.9% 9.5% GC >>> 3158 48.8% Shared libraries >>> 126 1.9% Unaccounted >>> >>> >>> - When I run with "--trace-gc", I see that the garbage collector is >>> mostly running scavenges that take 0.0 ms. Regardless, I changed my code >>> from using a "while(true)" loop to calling "setImmediate" every once in >>> a >>> while so that it would yield to the event loop, that way the GC >>> shouldn't >>> interrupt a turn while it's executing. I didn't notice a difference >>> doing >>> this. >>> - Running with --trace-deopt doesn't show any deoptimizations taking >>> a long time. >>> - Running with --trace-opt shows some optimizations taking a long >>> time, but I don't fully understand the results. For example, on one turn >>> that took 37.33 ms (which was 9 times the average for that match), 1 ms >>> was >>> spent deoptimizating, and apparently 137 ms were spent optimizing (20 ms >>> to >>> prepare, 115 ms to execute, and 2 ms to finalize). >>> - I ended up working perfhooks.performance.now() calls all over my >>> code: one set was placed around all of the code that I execute on behalf >>> of >>> the user, and one set was placed into individual function calls that are >>> called by the user. What I saw was that the time spent in individual >>> function calls didn't add up to the total amount of time that I >>> measured. >>> To me, this indicated that the time spent was not a direct result of my >>> JavaScript code, but rather an indirect result (e.g. optimization). >>> - Running that same test with "--minimal" would not reproduce those >>> results. Instead, the sum of the individual times would indeed add up to >>> the total time measured. >>> - I tried "warming up" the optimizations by running ~200 random >>> simulated matches at startup time, but I still ran into long-running >>> turns. >>> This might have been more effective if instead of randomly simulating, >>> I'd >>> exhaustively simulated such that every line of code was being hit. >>> >>> >>> Questions >>> >>> - Is it possible to choose when to run optimizations? For example, I >>> could have them run in between turns, that way I don't count it against >>> a >>> player when optimizations take a while. I realize that it would mean >>> that >>> I'd be running unoptimized code for a turn. >>> - Is it possible to tell exactly how long was spent optimizing from >>> within JavaScript? E.g. >>> >>> const startTime = Date.now(); >>> runSomeCode(); >>> const elapsedTime = Date.now() - startTime; >>> const timeSpentOptimizing = aV8FunctionThatMayOrMayNotExist(); // can I >>> do something for this? >>> const actualElapsedTime = elapsedTime - timeSpentOptimizing; >>> >>> >>> - Are there any other V8 flags that may be of help here? I tried >>> "--predictable" without seeing any differences, but maybe there's >>> something >>> I'm missing. >>> >>> >>> Thanks for reading through this large blob of text! >>> -Adam >>> >>> -- >>> -- >>> v8-users mailing list >>> v8-u...@googlegroups.com >>> http://groups.google.com/group/v8-users >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "v8-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to v8-users+u...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- -- v8-users mailing list v8-users@googlegroups.com http://groups.google.com/group/v8-users --- You received this message because you are subscribed to the Google Groups "v8-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.