On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu <d...@openjdk.org> wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
> poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
> According to the flame-graph below, it is StripZeros that spends most of the 
> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
> zeros.
> ![图片 
> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
> the target value is parsed from back to front, each time stripping out single 
> ‘0’. To optimize, we can adopt the method of binary search. That is, each 
> time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is 
> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
> Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
> impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, 
> each has 4cores, 16g memory and 4x500GB storage.

> _Mailing list message from [Louis Wasserman](mailto:lowas...@google.com) on 
> [core-libs-dev](mailto:core-libs-...@mail.openjdk.org):_
> 
> Could you do that benchmark with e.g. JMH rather than taking the difference 
> of System.currentTimeMillis? That would probably make it easier to read and 
> trust the results.
> 
> On Sun, Feb 12, 2023, 7:56 PM Sergey Kuksenko <skuksenko at openjdk.org> 
> wrote:
> 
> -------------- next part -------------- An HTML attachment was scrubbed... 
> URL: 
> <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230212/a0de161d/attachment.htm>

Sure. I have written a microbenchmark using JMH and it shows

before optimization 
DecimalBenchmark.divide_by_2         thrpt    5   334549.891 ±  3028.946  ops/s
DecimalBenchmark.divide_by_2_to_100  thrpt    5    15874.537 ±    38.942  ops/s
DecimalBenchmark.divide_by_3         thrpt    5  6190583.950 ± 91116.750  ops/s

after optimization 
DecimalBenchmark.divide_by_2         thrpt    5  1479106.480 ±  5950.440  ops/s
DecimalBenchmark.divide_by_2_to_100  thrpt    5    32730.634 ±    81.891  ops/s
DecimalBenchmark.divide_by_3         thrpt    5  6233700.252 ± 21043.571  ops/s


The following is the benchmark code

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 10)
@Measurement(iterations = 5, time = 10)
@Fork(value = 1)
public class DecimalBenchmark {
    private static final MathContext mc = new MathContext(38, 
RoundingMode.HALF_UP);

    @Benchmark
    public void divide_by_2_to_100() {
        for(long i = 2; i <= 100; i++) {
            BigDecimal divisor = BigDecimal.valueOf(i);
            BigDecimal.ONE.divide(divisor, mc);
        }
    }

    @Benchmark
    public void divide_by_2() {
        BigDecimal.ONE.divide(BigDecimal.valueOf(2L), mc);
    }

    @Benchmark
    public void divide_by_3() {
        BigDecimal.ONE.divide(BigDecimal.valueOf(3L), mc);
    }
}

-------------

PR: https://git.openjdk.org/jdk/pull/12509

Reply via email to