On Tue, 14 Feb 2023 03:20:14 GMT, Sergey Kuksenko <skukse...@openjdk.org> wrote:
>> [JDK-8269667](https://bugs.openjdk.org/browse/JDK-8269667) has uncovered the >> poor performance of BigDecimal.divide under certain circumstance. >> >> We confront similar situations when benchmarking Spark3 on TPC-DS test kit. >> According to the flame-graph below, it is StripZeros that spends most of the >> time of BigDecimal.divide. Hence we propose this patch to optimize stripping >> zeros. >>  >> >> Currently, createAndStripZerosToMatchScale() is performed linearly. That is, >> the target value is parsed from back to front, each time stripping out >> single ‘0’. To optimize, we can adopt the method of binary search. That is, >> each time we try to strip out ${scale/2} ‘0’s. >> >> The performance looks good. Therotically, time complexity of our method is >> O(log n), while the current one is O(n). In practice, benchmarks on Spark3 >> show that 1/3 less time (102s->68s) is spent on TPC-DS query4. We also runs >> Jtreg and JCK to check correctness, and it seems fine. >> >> More about environment: >> we run Spark3.3.0 on Openjdk11, but it seems jdk version doesn’t have much >> impact on BigDecimal. Spark cluster consists of a main node and 2 core >> nodes, each has 4cores, 16g memory and 4x500GB storage. > > The pr looks promising in terms of performance. > What makes sense to do: > > *) Don't rely on external benchmarks. It's fine if such exists, but anyway > set of microbenchmarks (using JMH) will be much better. More clear, readable > results, etc. E.g., it may show that other operations (for example, sqrt) > were speeded up too. > > *) Find boundaries. > "divideAndRemainder(bigTenToThe(scaleStep))" may produce non-zero reminder. > Find conditions when it happens. How big is performance regression in such > cases? > > Some other optimizations: > *) > Current code checks only the lowest bit (odd or even) to cut off indivisible > cases. > Making "divideAndRemainder(bigTenToThe(scaleStep))" - you make check > scaleStep lowest bits to do cut off. See "BigInteger.getLowestSetBit()" > > *) > BigInteger division by int value is faster. It's a special case. What is > faster, doing the single division by bigTenToThe(scaleStep) or doing several > divisions (split scaleStep to fit into int)? Exploration is required. @kuksenko Hi, I have updated the performance of this pr, I wonder if that's ok ------------- PR: https://git.openjdk.org/jdk/pull/12509