On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen <d...@openjdk.org> wrote:
> @cl4es made performance optimizations for the simple specifiers of > String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the > same idea, I continued to make improvements. I made patterns like %2d %02d > also be optimized. > > The following are the test results based on MacBookPro M1 Pro: > > > -Benchmark Mode Cnt Score Error Units > -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op > -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op > -StringFormat.intFormat avgt 15 84.432 ? 4.145 ns/op > -StringFormat.longFormat avgt 15 87.330 ? 6.111 ns/op > -StringFormat.stringFormat avgt 15 63.985 ? 11.366 ns/op > -StringFormat.stringIntFormat avgt 15 87.422 ? 0.147 ns/op > -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op > -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op > > +Benchmark Mode Cnt Score Error Units > +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op > (+151.45) > +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op > (+138.46) > +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op > (+25.59) > +StringFormat.longFormat avgt 15 66.444 ? 0.614 ns/op > (+31.44) > +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op (+2.19) > +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op (-2.44) > +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op > (+377.95) > +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op > (+206.91) I enhanced parse fast-path to support more specifiers, including: % flag_1 width_1 % flag_2 % width_2 % width_1 . precesion_1 now benchmark on macbook m1 pro result is: -Benchmark Mode Cnt Score Error Units (optimized) -StringFormat.complexFormat avgt 15 2049.387 ? 121.539 ns/op -StringFormat.flags2Format avgt 15 430.964 ? 2.414 ns/op -StringFormat.flagsFormat avgt 15 257.851 ? 23.833 ns/op -StringFormat.stringFormat avgt 15 63.564 ? 10.490 ns/op -StringFormat.stringIntFormat avgt 15 88.111 ? 0.678 ns/op -StringFormat.width2Format avgt 15 349.304 ? 31.349 ns/op -StringFormat.width2PrecisionFormat avgt 15 464.621 ? 53.918 ns/op -StringFormat.widthFormat avgt 15 301.997 ? 34.974 ns/op -StringFormat.widthPrecisionFormat avgt 15 484.526 ? 38.098 ns/op -StringFormat.widthStringFormat avgt 15 235.421 ? 32.955 ns/op -StringFormat.widthStringIntFormat avgt 15 315.178 ? 15.154 ns/op +Benchmark Mode Cnt Score Error Units +StringFormat.complexFormat avgt 15 702.407 ? 85.481 ns/op (+191.77) +StringFormat.flags2Format avgt 15 329.551 ? 1.610 ns/op (+30.78) +StringFormat.flagsFormat avgt 15 125.798 ? 1.109 ns/op (+104.98) +StringFormat.stringFormat avgt 15 60.029 ? 6.275 ns/op (+5.89) +StringFormat.stringIntFormat avgt 15 89.020 ? 0.575 ns/op (-1.03) +StringFormat.width2Format avgt 15 135.743 ? 0.643 ns/op (+157.33) +StringFormat.width2PrecisionFormat avgt 15 351.408 ? 21.031 ns/op (+32.22) +StringFormat.widthFormat avgt 15 208.843 ? 47.504 ns/op (+44.61) +StringFormat.widthPrecisionFormat avgt 15 354.375 ? 67.314 ns/op (+36.73) +StringFormat.widthStringFormat avgt 15 74.846 ? 19.604 ns/op (+214.55) +StringFormat.widthStringIntFormat avgt 15 101.638 ? 0.961 ns/op (+210.10) > I was worried this would sprawl out more, but perhaps ~230 lines of code is a > reasonable extra weight to make the long tail of `String.format`'s regex-free. > > I was going to comment that the flag parsing was broken in > [f303f29](https://github.com/openjdk/jdk/commit/f303f2959d108d993dc03e86a27ef42bb892647f) > but it seems that it was fixed in the latest. I think we need to make a > review pass over all existing tests to make sure all imaginable variants are > covered. > > The parser code also ought to be shared between `Formatter` and > `FormatProcessor` so that there's a single source of truth going forward. The codes of Formatter and FormatProcessor have been regex-free. There are many changes and require more detailed review. I will delete redundant performance tests later. and I will delete redundant performance tests ,The current results are as follows : # Performance Numbers ## 1. [aliyun_ecs_c8i.xlarge](https://help.aliyun.com/document_detail/25378.html#c8i) * cpu : intel xeon sapphire rapids (x64) * os debian linux -Benchmark Mode Cnt Score Error Units (baseline) -StringFormat.complexFormat avgt 15 1426.696 ? 18.469 ns/op -StringFormat.flags2Format avgt 15 164.141 ? 2.264 ns/op -StringFormat.flagsFormat avgt 15 169.313 ? 6.616 ns/op -StringFormat.stringFormat avgt 15 34.710 ? 0.075 ns/op -StringFormat.stringIntFormat avgt 15 85.152 ? 0.337 ns/op -StringFormat.width2Format avgt 15 242.483 ? 5.586 ns/op -StringFormat.width2PrecisionFormat avgt 15 282.838 ? 2.564 ns/op -StringFormat.widthFormat avgt 15 175.460 ? 4.458 ns/op -StringFormat.widthPrecisionFormat avgt 15 244.593 ? 3.605 ns/op -StringFormat.widthStringFormat avgt 15 144.487 ? 5.271 ns/op -StringFormat.widthStringIntFormat avgt 15 223.913 ? 6.387 ns/op +Benchmark Mode Cnt Score Error Units (59c2983b) +StringFormat.complexFormat avgt 15 582.650 ? 17.399 ns/op (+144.87) +StringFormat.flags2Format avgt 15 74.214 ? 0.703 ns/op (+121.18) +StringFormat.flagsFormat avgt 15 67.764 ? 0.572 ns/op (+149.86) +StringFormat.stringFormat avgt 15 34.659 ? 0.201 ns/op (+0.15) +StringFormat.stringIntFormat avgt 15 84.448 ? 0.532 ns/op (+0.84) +StringFormat.width2Format avgt 15 123.012 ? 0.513 ns/op (+97.13) +StringFormat.width2PrecisionFormat avgt 15 148.092 ? 1.273 ns/op (+90.99) +StringFormat.widthFormat avgt 15 69.575 ? 1.023 ns/op (+152.19) +StringFormat.widthPrecisionFormat avgt 15 116.187 ? 0.938 ns/op (+110.52) +StringFormat.widthStringFormat avgt 15 48.389 ? 0.298 ns/op (+198.60) +StringFormat.widthStringIntFormat avgt 15 103.617 ? 2.204 ns/op (+116.10) ## 2. [aliyun_ecs_c8y.xlarge](https://help.aliyun.com/document_detail/25378.html#c8y) * cpu : aliyun yitian 710 (aarch64) * os debian linux -Benchmark Mode Cnt Score Error Units (baseline) -StringFormat.complexFormat avgt 15 2321.319 ? 9.624 ns/op -StringFormat.flags2Format avgt 15 310.377 ? 10.367 ns/op -StringFormat.flagsFormat avgt 15 295.118 ? 8.645 ns/op -StringFormat.stringFormat avgt 15 55.966 ? 0.949 ns/op -StringFormat.stringIntFormat avgt 15 157.949 ? 2.972 ns/op -StringFormat.width2Format avgt 15 380.621 ? 11.301 ns/op -StringFormat.width2PrecisionFormat avgt 15 447.285 ? 7.323 ns/op -StringFormat.widthFormat avgt 15 312.622 ? 5.104 ns/op -StringFormat.widthPrecisionFormat avgt 15 407.196 ? 6.466 ns/op -StringFormat.widthStringFormat avgt 15 248.538 ? 2.356 ns/op -StringFormat.widthStringIntFormat avgt 15 416.661 ? 6.685 ns/op +Benchmark Mode Cnt Score Error Units (59c2983b) +StringFormat.complexFormat avgt 15 930.922 ? 91.995 ns/op (+149.36) +StringFormat.flags2Format avgt 15 132.746 ? 10.809 ns/op (+133.82) +StringFormat.flagsFormat avgt 15 119.267 ? 11.709 ns/op (+147.45) +StringFormat.stringFormat avgt 15 55.820 ? 0.324 ns/op (+0.27) +StringFormat.stringIntFormat avgt 15 154.045 ? 7.327 ns/op (+2.54) +StringFormat.width2Format avgt 15 177.655 ? 4.797 ns/op (+114.25) +StringFormat.width2PrecisionFormat avgt 15 236.680 ? 4.266 ns/op (+88.99) +StringFormat.widthFormat avgt 15 132.043 ? 15.730 ns/op (+136.76) +StringFormat.widthPrecisionFormat avgt 15 204.085 ? 10.300 ns/op (+99.53) +StringFormat.widthStringFormat avgt 15 106.971 ? 5.527 ns/op (+132.35) +StringFormat.widthStringIntFormat avgt 15 215.329 ? 3.786 ns/op (+93.50) > Please don't pile on new refactorings and improvements on a PR that has been > opened for review. Better to let things brew as a draft for a bit if you're > not sure you're done before opening the PR for review, then once it's been > opened (like this one) consider preparing follow-up PR instead of refactoring > as you go. > > Specifically I'm not sure > [0d977b2](https://github.com/openjdk/jdk/commit/0d977b2febe455f4535e6ee2cb19d3b168d764e3) > is a good idea and would like you to roll those changes back. Object pooling > for trivial, short-lived objects are considered an anti-pattern, as they add > references to old GC generations and share many of the same drawbacks as > lookup tables, such as increased cache traffic. Showing great wins on > microbenchmarks while being a wash or even regressing real applications. Sorry, I will pay attention to it in the future and modify it in the open review code. I have revert commit to #0d977b2. I agree with your view on the performance issues of old reference. int parse(List<FormatString> al, char first, int start, String s, int max) java.util.Formatter::parse (469 bytes) failed to inline: callee is too large The reason why I split it into multiple small methods is to avoid a single method codeSize > 325. After merging small methods, the performance will decrease. I know, so I'm asking for your opinion, and if you don't agree, I won't submit big changes. There have been a lot of changes now, and I will continue to complete this PR based on the current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1723733247 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1730164585 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1731163579 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1732567054 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1733490226 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1740015776