Takahiko Saito created HIVE-13943: ------------------------------------- Summary: Null is inserted into an existing partition after replacing a column of int with string of text formatted partitioned table Key: HIVE-13943 URL: https://issues.apache.org/jira/browse/HIVE-13943 Project: Hive Issue Type: Bug Affects Versions: 1.2.1, 2.1.0 Reporter: Takahiko Saito
Create a text formatted table with a int column partitioned by a string column. After replacing the columns of int with string and inserting a new row with the existing partition ('horton' in this case), null is inserted as a value in the altered column: {noformat} 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test; No rows affected (0.249 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (i int) partitioned by (s string); No rows affected (0.116 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values(1, 'horton'); INFO : Session is already open INFO : Dag name: insert into table test partition...'horton')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-46_619_6747668726633461908-940/-ext-10000 INFO : Time taken for load dynamic partitions : 122 INFO : Loading partition {s=horton} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{s=horton} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1] No rows affected (8.301 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+---------+--+ | test.i | test.s | +---------+---------+--+ | 1 | horton | +---------+---------+--+ 1 row selected (0.184 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (i string); No rows affected (0.138 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (s) values ('horton', 'horton'); INFO : Session is already open INFO : Dag name: insert into table test partition...'horton')(Stage-1) INFO : INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (s=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-09-55_295_8631413609124947877-940/-ext-10000 INFO : Time taken for load dynamic partitions : 190 INFO : Loading partition {s=horton} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{s=horton} stats: [numFiles=2, numRows=2, totalSize=9, rawDataSize=7] No rows affected (1.35 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+---------+--+ | test.i | test.s | +---------+---------+--+ | 1 | horton | | NULL | horton | +---------+---------+--+ 2 rows selected (0.08 seconds) {noformat} The below is explain of insertion: {noformat} 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> explain insert into table test partition (s) values ('horton', 'horton'); +-------------------------------------------------------------------------------------------------------+--+ | Explain | +-------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-2 depends on stages: Stage-1 | | Stage-0 depends on stages: Stage-2 | | Stage-3 depends on stages: Stage-0 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20160603211130_4262d739-5bc1-4be0-95c3-2b666f5db7b8:1323 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: values__tmp__table__17 | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: tmp_values_col1 (type: string), tmp_values_col2 (type: string) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 14 Basic stats: COMPLETE Column stats: NONE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | name: default.test | | | | Stage: Stage-2 | | Dependency Collection | | | | Stage: Stage-0 | | Move Operator | | tables: | | partition: | | s | | replace: false | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | name: default.test | | | | Stage: Stage-3 | | Stats-Aggr Operator | | | +-------------------------------------------------------------------------------------------------------+--+ {noformat] The issue is NOT seen with orc table. Also I tried with another table with int a string column partitioned by timestamp column, but the issue was NOT seen: {noformat} 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> drop table if exists test; No rows affected (0.284 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> create table test (s string) partitioned by (ts timestamp); No rows affected (0.093 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values ('1', '2034-08-04 17:42:59.0'); INFO : Session is already open INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-17-35_097_7856579707078577394-940/-ext-10000 INFO : Time taken for load dynamic partitions : 132 INFO : Loading partition {ts=2034-08-04 17:42:59.0} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=1, numRows=1, totalSize=2, rawDataSize=1] No rows affected (4.917 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+------------------------+--+ | test.s | test.ts | +---------+------------------------+--+ | 1 | 2034-08-04 17:42:59.0 | +---------+------------------------+--+ 1 row selected (0.104 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> alter table test replace columns (s int); No rows affected (0.104 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> insert into table test partition (ts) values ('2', '2034-08-04 17:42:59.0'); INFO : Session is already open INFO : Dag name: insert into table test partit...17:42:59.0')(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1464727816747_0759) INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.test partition (ts=null) from hdfs://ts-0531-5.openstacklocal:8020/apps/hive/warehouse/test/.hive-staging_hive_2016-06-03_21-17-40_287_4196012162926457792-940/-ext-10000 INFO : Time taken for load dynamic partitions : 252 INFO : Loading partition {ts=2034-08-04 17:42:59.0} INFO : Time taken for adding to write entity : 0 INFO : Partition default.test{ts=2034-08-04 17:42:59.0} stats: [numFiles=2, numRows=2, totalSize=4, rawDataSize=2] No rows affected (1.573 seconds) 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> 0: jdbc:hive2://ts-0531-1.openstacklocal:2181> select * from test; +---------+------------------------+--+ | test.s | test.ts | +---------+------------------------+--+ | 1 | 2034-08-04 17:42:59.0 | | 2 | 2034-08-04 17:42:59.0 | +---------+------------------------+--+ 2 rows selected (0.116 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)