maybe I found the root cause from spark doc:
"Unlimited precision decimal columns are no longer supported, instead Spark SQL
enforces a maximum precision of 38. When inferring schema from BigDecimal
objects, a precision of (38, 18) is now used. When no precision is specified in
DDL then the defa
I found spark sql lost precision, and handle data as int with some rule.
Following is data got via hive shell and spark sql, with same sql to same hive
table:
Hive:
0.4
0.5
1.8
0.4
0.49
1.5
Spark sql:
1
2
2
Seems the handle rule is: when decimal point data <0.5 then to 0, when decimal
point data
The output is:
Spark SQ:6828127
Hive:6980574.1269
发自 网易邮箱大师
在2016年04月20日 18:06,FangFang Chen 写道:
Hi all,
Please give some suggestions. Thanks
With following same sql, spark sql and hive give different result. The sql is
sum(decimal(38,18)) columns.
Select sum(column) from table;
column is defi
Hi all,
Please give some suggestions. Thanks
With following same sql, spark sql and hive give different result. The sql is
sum(decimal(38,18)) columns.
Select sum(column) from table;
column is defined as decimal(38,18).
Spark version:1.5.3
Hive version:2.0.0
发自 网易邮箱大师