回复:Re: 回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
maybe I found the root cause from spark doc: "Unlimited precision decimal columns are no longer supported, instead Spark SQL enforces a maximum precision of 38. When inferring schema from BigDecimal objects, a precision of (38, 18) is now used. When no precision is specified in DDL then the defa

回复:回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
I found spark sql lost precision, and handle data as int with some rule. Following is data got via hive shell and spark sql, with same sql to same hive table: Hive: 0.4 0.5 1.8 0.4 0.49 1.5 Spark sql: 1 2 2 Seems the handle rule is: when decimal point data <0.5 then to 0, when decimal point data

回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
The output is: Spark SQ:6828127 Hive:6980574.1269 发自 网易邮箱大师 在2016年04月20日 18:06,FangFang Chen 写道: Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column is defi

Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column is defined as decimal(38,18). Spark version:1.5.3 Hive version:2.0.0 发自 网易邮箱大师