On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen <d...@openjdk.org> wrote:
> @cl4es made performance optimizations for the simple specifiers of > String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the > same idea, I continued to make improvements. I made patterns like %2d %02d > also be optimized. > > The following are the test results based on MacBookPro M1 Pro: > > > -Benchmark Mode Cnt Score Error Units > -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op > -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op > -StringFormat.intFormat avgt 15 84.432 ? 4.145 ns/op > -StringFormat.longFormat avgt 15 87.330 ? 6.111 ns/op > -StringFormat.stringFormat avgt 15 63.985 ? 11.366 ns/op > -StringFormat.stringIntFormat avgt 15 87.422 ? 0.147 ns/op > -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op > -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op > > +Benchmark Mode Cnt Score Error Units > +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op > (+151.45) > +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op > (+138.46) > +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op > (+25.59) > +StringFormat.longFormat avgt 15 66.444 ? 0.614 ns/op > (+31.44) > +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op (+2.19) > +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op (-2.44) > +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op > (+377.95) > +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op > (+206.91) For maintainability purposes, we should strive for simplicity and readability. The parser looks good (but see the small proposed enhancements). I'll continue with the rest tomorrow. You might consider this alternative, which IMO is simpler and more readable. int parse() { // %[argument_index$][flags][width][.precision][t]conversion // %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%]) parseArgument(); parseFlag(); parseWidth(); int precisionSize = parsePrecision(); if (precisionSize < 0) { return 0; } // ([tT])?([a-zA-Z%]) char t = '\0', conversion = '\0'; if ((c == 't' || c == 'T') && off + 1 < max) { char c1 = s.charAt(off + 1); if (isConversion(c1)) { t = c; conversion = c1; off += 2; } } if (conversion == '\0' && isConversion(c)) { conversion = c; ++off; } if (argSize + flagSize + widthSize + precisionSize + t + conversion != 0) { if (al != null) { FormatSpecifier formatSpecifier = new FormatSpecifier(s, start, argSize, flagSize, widthSize, precisionSize, t, conversion); al.add(formatSpecifier); } return off - start; } return 0; } private void parseArgument() { // (\d+$)? int i = off; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body if (i == max || c != '$') { c = first; return; } ++i; // skip '$' if (i < max) { c = s.charAt(i); } argSize = i - off; off = i; } private void parseFlag() { // ([-#+ 0,(<]*)? int i = off; for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i); // empty body flagSize = i - off; off = i; } private void parseWidth() { // (\d+)? int i = off; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body widthSize = i - off; off = i; } private int parsePrecision() { // (.\d+)? if (c != '.') { return 0; } int i = off + 1; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body if (i == off + 1) { // a '.' but no digits return -1; } int size = i - off; off = i; return size; } Meaningful external reviews take a _lot_ of engineering time on such a popular platform as the JDK. They make sense only on somehow stable code, otherwise they are a waste of human resources. So sure, take your time to stabilize your code, make your tests, but please refrain from keep on changing too much once you publish a PR. If reviewers have to face continuous changes, they might become uninterested in reviewing. src/java.base/share/classes/java/util/Formatter.java line 2899: > 2897: if (c == '.') { > 2898: // (\.\d+)? > 2899: precisionSize = parsePrecision(); `precisionSize == 0` has two different meanings: (1) there's a '.' not followed by digits and (2) there's no precision field at all. In the proposed [alternative](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458), case (1) is indicated by `-1`, while case (2) is indicated by `0`. Besides, the decision is taken in the `parsePrecision()` method, not partly here and partly in the method. src/java.base/share/classes/java/util/Formatter.java line 2918: > 2916: conversion = c; > 2917: ++off; > 2918: } Suggestion: } else if (isConversion(c)) { conversion = c; ++off; } src/java.base/share/classes/java/util/Formatter.java line 2933: > 2931: private void parseArgument() { > 2932: // (\d+\$)? > 2933: for (int size = 0; off < max; c = s.charAt(++off), size++) { This argument_index parser is incorrect. Say the invalid "specifier" is `"%12"`. The parser would throw a `StringIndexOutOfBoundsException` on `s.charAt(++off)`. src/java.base/share/classes/java/util/Formatter.java line 2934: > 2932: // (\d+\$)? > 2933: int i = off; > 2934: for (; i < max && isDigit(c); c = next(++i)); // empty body No need for `next()` if done as in the proposed alternative. src/java.base/share/classes/java/util/Formatter.java line 2947: > 2945: c = first; > 2946: } > 2947: } I think the logic in the proposed alternative is more readable. src/java.base/share/classes/java/util/Formatter.java line 2953: > 2951: // ([-#+ 0,(\<]*)? > 2952: int i = off; > 2953: for (; i < max && Flags.isFlag(c); c = next(++i)); // > empty body No need for `next()` if done as in the proposed alternative. src/java.base/share/classes/java/util/Formatter.java line 2961: > 2959: // (\d+)? > 2960: int i = off; > 2961: for (; i < max && isDigit(c); c = next(++i)); // empty body You are using `next()` here, but the simpler solution in `parsePrecision()`. I think consistency simplifies understanding. src/java.base/share/classes/java/util/Formatter.java line 2966: > 2964: > 2965: private int parsePrecision() { > 2966: // (\.\d+)? Since processing of `'.'` is done by the caller, the regex pattern in this comment is confusing. Either prefer handling `'.'` here (see [this](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458)), or adjust the regex in the comment. src/java.base/share/classes/java/util/Formatter.java line 2978: > 2976: } > 2977: > 2978: private char next(int off) { There's no real need for this more expensive method, compared with a simple `s.charAt(i)`, in the alternative proposed [before](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458). src/java.base/share/classes/java/util/Formatter.java line 3136: > 3134: int flagEnd = argEnd + flagSize; > 3135: int widthEnd = flagEnd + widthSize; > 3136: int precesionEnd = widthEnd + precisionSize; Suggestion: int precisionEnd = widthEnd + precisionSize; ------------- PR Review: https://git.openjdk.org/jdk/pull/15776#pullrequestreview-1647449239 PR Review: https://git.openjdk.org/jdk/pull/15776#pullrequestreview-1660250320 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1737986458 PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1739462969 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339852931 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1347671680 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1338962488 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856693 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856725 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339857403 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339844252 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1347679848 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339842414 PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339135086