On Mon, 6 Nov 2023 18:52:05 GMT, Sergey Bylokhov <s...@openjdk.org> wrote:
> Since we plan to import it into jdk22, do you have some performance data to > share? any positive or negative effects of this migration? There's three phases - (1) startup, (2) warmup and (3) warmed up performance. JNI has minimal startup / warmup cost, getting to warmed up performance right away. So if your app starts up and makes just one call to layout, JNI wins easily. But if it keeps going, then FFM comes out ahead, even counting that startup /warmup cost. There's a cost to the first time some code in JDK initialises the core FFM. If that code happens to be this layout code, it'll see that overhead. That was somewhere around 75ms on my Mac. On top of that there's the cost of creating the specific method handles and var handles I have 11 of these, and the total there is about 35-40ms. So we have somewhere around a fixed 125ms startup cost for the FFM case - as measured on my Mac, but only 35-40ms of that is attributable to the specific needs of layout. And there is some potential for that code to get faster some day Also if any of the techniques such as AppCDS, or some day, Leyden condensers, are used then there is also potential to eliminate much of the warmup cost. The FFM path then needs to be warmed. Once warmed up, FFM is always as fast or faster than JNI. 20% faster is typical as measured by a small test that just calls layout in a loop. It was tried with varying lengths of string. For just a single char, FFM was only a little faster, but gets better for longer strings. Once we start to use layout, we use it a lot, so you reach many thousands of calls very quickly. Just resizing your UI window causes that. It doesn't take long for FFM to become an overall win. That includes amortizing the cost of the startup / warmup time. As well as a microbenchmark, I looked at what it does in an app consisting of a Swing JTextArea displaying a decent amount of Hindi using an OpenType Indic font on Mac. That takes just over 16,000 (!) calls to layout to get to fully displayed. Then if you just resize back and forth in just a few seconds FFM catches up and overtakes I'll show numbers below - this measure all the FFM+layout costs but nothing else in the app. It bears out what I said about startup. "layoutCnt" is the number of calls to the method to do layout on a single run of text. The numbers look like a lot of calls to layout and you might think that took hours but this really is just about 20-30 secs of manual resizing to get to one million calls. JNI == layoutCnt=1 total=3ms <<< JNI very fast to start up layoutCnt=2 total=3ms layoutCnt=3 total=3ms layoutCnt=4 total=4ms layoutCnt=5 total=4ms layoutCnt=1000 total=31ms layoutCnt=2000 total=40ms << 9-10ms per thousand calls (40-31) layoutCnt=3000 total=51ms layoutCnt=4000 total=61ms layoutCnt=5000 total=69ms layoutCnt=6000 total=77ms layoutCnt=7000 total=90ms layoutCnt=8000 total=100ms layoutCnt=9000 total=113ms layoutCnt=10000 total=122ms layoutCnt=11000 total=134ms layoutCnt=12000 total=150ms layoutCnt=13000 total=157ms layoutCnt=14000 total=169ms layoutCnt=15000 total=181ms layoutCnt=16000 total=193ms <<< app fully displayed ... layoutCnt=250000 total=2450ms <<< rough point at which they are equal ... layoutCnt=1000000 total=9115ms <<< after 1 million calls FFM is clearly behind layoutCnt=1001000 total=9124ms << STILL 9-10ms per thousand calls (9124-9115) FFM === layoutCnt=1 total=186ms << // FFM slow to start up, includes 75ms core FFM, 35-40 varhandles + no JIT yet layoutCnt=2 total=188ms layoutCnt=3 total=189ms layoutCnt=4 total=195ms layoutCnt=5 total=195ms layoutCnt=1000 total=269ms layoutCnt=2000 total=284ms << 15 ms per thousand calls (284-269) layoutCnt=3000 total=301ms layoutCnt=4000 total=317ms layoutCnt=5000 total=333ms layoutCnt=6000 total=348ms layoutCnt=7000 total=365ms layoutCnt=8000 total=376ms layoutCnt=9000 total=388ms layoutCnt=10000 total=397ms layoutCnt=11000 total=407ms layoutCnt=12000 total=419ms layoutCnt=13000 total=425ms layoutCnt=14000 total=435ms layoutCnt=15000 total=444ms layoutCnt=16000 total=453ms <<< app fully displayed ... layoutCnt=250000 total=2426ms <<< rough point at which they are equal ... layoutCnt=1000000 total=8489ms <<< after 1 million calls FFM is clearly ahead layoutCnt=1001000 total=8496ms << now about 7 ms per thousand calls (8496-8489) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15476#issuecomment-1797025476