On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu <d...@openjdk.org> wrote:
> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the > poor performance of BigDecimal.divide under certain circumstance. > > We confront similar situations when benchmarking Spark3 on TPC-DS test kit. > According to the flame-graph below, it is StripZeros that spends most of the > time of BigDecimal.divide. Hence we propose this patch to optimize stripping > zeros. >  > > Currently, createAndStripZerosToMatchScale() is performed linearly. That is, > the target value is parsed from back to front, each time stripping out single > ‘0’. To optimize, we can adopt the method of binary search. That is, each > time we try to strip out ${scale/2} ‘0’s. > > The performance looks good. Therotically, time complexity of our method is > O(log n), while the current one is O(n). In practice, benchmarks on Spark3 > show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs > Jtreg and JCK to check correctness, and it seems fine. > > More about environment: > we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much > impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, > each has 4cores, 16g memory and 4x500GB storage. After making sure that `intVal` is even, and before attempting a division by a power of 10, it might help to check if 5 divides `intVal` in the first place. If it doesn't, there no point in performing the division. It can be shown that 5 divides `intVal` if and only if it divides the `long` sum of all `int`s in the `mag` array of `intVal`. I didn't try out if this contributes to improve the overall performance, but it might be worth giving a try. ------------- PR: https://git.openjdk.org/jdk/pull/12509