On Fri, 10 Feb 2023 10:00:05 GMT, Xiaowei Lu <d...@openjdk.org> wrote:

> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the 
> poor performance of BigDecimal.divide under certain circumstance.
> 
> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. 
> According to the flame-graph below, it is StripZeros that spends most of the 
> time of BigDecimal.divide. Hence we propose this patch to optimize stripping 
> zeros.
> ![图片 
> 1](https://user-images.githubusercontent.com/39413832/218062061-53cd0220-776e-4b72-8b9a-6b0f11707986.png)
> 
> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, 
> the target value is parsed from back to front, each time stripping out single 
> ‘0’. To optimize, we can adopt the method of binary search. That is, each 
> time we try to strip out ${scale/2} ‘0’s. 
> 
> The performance looks good. Therotically, time complexity of our method is 
> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 
> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs 
> Jtreg and JCK to check correctness, and it seems fine.
> 
> More about environment: 
> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much 
> impact on BigDecimal. Spark cluster consists of a main node and 2 core nodes, 
> each has 4cores, 16g memory and 4x500GB storage.

The pr looks promising in terms of performance.
What makes sense to do:

*)  Don't rely on external benchmarks. It's fine if such exists, but anyway set 
of microbenchmarks (using JMH) will be much better. More clear, readable 
results, etc. E.g., it may show that other operations (for example, sqrt) were 
speeded up too.

*) Find boundaries. 
"divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero reminder. 
Find conditions when it happens. How big is performance regression in such 
cases?

Some other optimizations:
*)
Current code checks only the lowest bit (odd or even) to cut off indivisible 
cases.
Making "divideAndRemainder(bigTenToThe(scaleStep))"  - you make check scaleStep 
lowest bits to do cut off. See "BigInteger.getLowestSetBit()"

*)
BigInteger division by int value is faster. It's a special case. What is 
faster, doing the single division by bigTenToThe(scaleStep) or doing several 
divisions (split scaleStep to fit into int)? Exploration is required.

-------------

PR: https://git.openjdk.org/jdk/pull/12509

Reply via email to