On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu <d...@openjdk.org> wrote:
> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the > poor performance of BigDecimal.divide under certain circumstance. > > We confront similar situations when benchmarking Spark3 on TPC-DS test kit. > According to the flame-graph below, it is StripZeros that spends most of the > time of BigDecimal.divide. Hence we propose this patch to optimize stripping > zeros. >  > > Currently, createAndStripZerosToMatchScale() is performed linearly. That is, > the target value is parsed from back to front, each time stripping out single > ‘0’. To optimize, we can adopt the method of binary search. That is, each > time we try to strip out ${scale/2} ‘0’s. > > The performance looks good. Therotically, time complexity of our method is > O(log n), while the current one is O(n). In practice, benchmarks on Spark3 > show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs > Jtreg and JCK to check correctness, and it seems fine. > > More about environment: > we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much > impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, > each has 4cores, 16g memory and 4x500GB storage. > _Mailing list message from [Louis Wasserman](mailto:lowas...@google.com) on > [core-libs-dev](mailto:core-libs-...@mail.openjdk.org):_ > > Could you do that benchmark with e.g. JMH rather than taking the difference > of System.currentTimeMillis? That would probably make it easier to read and > trust the results. > > On Sun, Feb 12, 2023, 7:56 PM Sergey Kuksenko <skuksenko at openjdk.org> > wrote: > > -------------- next part -------------- An HTML attachment was scrubbed... > URL: > <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20230212/a0de161d/attachment.htm> Sure. I have written a microbenchmark using JMH and it shows before optimization DecimalBenchmark.divide_by_2 thrpt 5 334549.891 ± 3028.946 ops/s DecimalBenchmark.divide_by_2_to_100 thrpt 5 15874.537 ± 38.942 ops/s DecimalBenchmark.divide_by_3 thrpt 5 6190583.950 ± 91116.750 ops/s after optimization DecimalBenchmark.divide_by_2 thrpt 5 1479106.480 ± 5950.440 ops/s DecimalBenchmark.divide_by_2_to_100 thrpt 5 32730.634 ± 81.891 ops/s DecimalBenchmark.divide_by_3 thrpt 5 6233700.252 ± 21043.571 ops/s The following is the benchmark code @State(Scope.Benchmark) @BenchmarkMode(Mode.Throughput) @Warmup(iterations = 5, time = 10) @Measurement(iterations = 5, time = 10) @Fork(value = 1) public class DecimalBenchmark { private static final MathContext mc = new MathContext(38, RoundingMode.HALF_UP); @Benchmark public void divide_by_2_to_100() { for(long i = 2; i <= 100; i++) { BigDecimal divisor = BigDecimal.valueOf(i); BigDecimal.ONE.divide(divisor, mc); } } @Benchmark public void divide_by_2() { BigDecimal.ONE.divide(BigDecimal.valueOf(2L), mc); } @Benchmark public void divide_by_3() { BigDecimal.ONE.divide(BigDecimal.valueOf(3L), mc); } } ------------- PR: https://git.openjdk.org/jdk/pull/12509