[ https://issues.apache.org/jira/browse/HIVE-18080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang updated HIVE-18080: ------------------------------ Comment: was deleted (was: It seems after If expression is vectorized by AVX2 instructions. It only consumes 2.8% of all total instructions and consumes very little cpu time(23ms) when the warm iter number is 5000 and iter number is 50000. So the variation of this test is too big. Later will try to enlarge iterations of jmh test to see the degradation exists or not. {code} To specify different parameters, use: * - This command will use 10 warm-up iterations, 5 test iterations, and 2 forks. And it will * display the Average Time (avgt) in Microseconds (us) * - Benchmark mode. Available modes are: * [Throughput/thrpt, AverageTime/avgt, SampleTime/sample, SingleShotTime/ss, All/all] * - Output time unit. Available time units are: [m, s, ms, us, ns]. * <p/> * $ java -jar target/benchmarks.jar org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 5 -f 2 -bm avgt -tu us */ {code}) > Performance degradation on > VectorizedLogicBench#IfExprLongColumnLongColumnBench when AVX512 is enabled > ------------------------------------------------------------------------------------------------------ > > Key: HIVE-18080 > URL: https://issues.apache.org/jira/browse/HIVE-18080 > Project: Hive > Issue Type: Bug > Reporter: liyunzhang > Attachments: IFExpression_AVX2_Instruction.png, > log.logic.avx1.single.0, log_logic.avx1.part > > > Use Xeon(R) Platinum 8180 CPU to test the performance of > [AVX512|https://en.wikipedia.org/wiki/AVX-512]. > {code} > #cat /proc/cpuinfo |grep "model name"|head -n 1 > model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz > {code} > Before that I have compiled hive with JDK9 as JDK9 enables AVX512 > Use hive microbenchmark(HIVE-10189) to evaluate the performance improvement. > It seems performance(20%+) in cases in > {{VectorizedArithmeticBench}},{{VectorizedComparisonBench}},{{VectorizedLikeBench}},{{VectorizedLogicBench}} > execpt > {{VectorizedLogicBench#IfExprLongColumnLongColumnBench}},{{VectorizedLogicBench#IfExprRepeatingLongColumnLongColumnBench}} > and > {{VectorizedLogicBench#IfExprLongColumnRepeatingLongColumnBench}}.The data is > like following > When i use Skylake CPU to evaluate the performance improvement of AVX512. > I found the performance in VectorizedLogicBench is like following > || ||AVX2 us/op||AVX512 us/op || (AVX2-AVX512)/AVX2|| > |ColAndColBench|122510| 87014| 28.9%| > |IfExprLongColumnLongColumnBench | 1325759| 1436073| -8.3% | > |IfExprLongColumnRepeatingLongColumnBench|1397447|1480450| -5.9%| > |IfExprRepeatingLongColumnLongColumnBench|1401164|1483062| -5.9% | > |NotColBench|77042.83|51513.28| 33%| > There are degradation in > IfExprLongColumnLongColumnBench,IfExprLongColumnRepeatingLongColumnBench, > IfExprRepeatingLongColumnLongColumnBench, very confused why there is > degradation on IfExprLongColumnLongColumnBench cases. > Here we use {{taskset -cp 1 $pid}} to run the benchmark on single core to > avoid the impact of dynamic CPU frequency scaling. > my script > {code} > export JAVA_HOME=/home/zly/jdk-9.0.1/ > export PATH=$JAVA_HOME/bin:$PATH > export LD_LIBRARY_PATH=/home/zly/jdk-9.0.1/mylib > for i in 0 1 2; do > java -server -XX:UseAVX=3 -jar benchmarks.jar > org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 > -f 1 -bm avgt -tu us >log.logic.avx3.single.$i & export pid=$! > taskset -cp 1 $pid > wait $pid > done > for i in 0 1 2; do > java -server -XX:UseAVX=2 -jar benchmarks.jar > org.apache.hive.benchmark.vectorization.VectorizedLogicBench * -wi 10 -i 20 > -f 1 -bm avgt -tu us >log.logic.avx2.single.$i & export pid=$! > taskset -cp 1 $pid > wait $pid > done > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)