[ https://issues.apache.org/jira/browse/HIVE-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ganesha Shreedhara updated HIVE-21428: -------------------------------------- Summary: field delimiter property set at partition level is not getting respected when schema evolution/vectorized execution is enabled (was: field delimiter property set at partition level is not getting respected when schema evolution is enabled) > field delimiter property set at partition level is not getting respected when > schema evolution/vectorized execution is enabled > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-21428 > URL: https://issues.apache.org/jira/browse/HIVE-21428 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.1 > Reporter: Ganesha Shreedhara > Priority: Major > > *Steps to reproduce:* > – create a partitioned table > {code:java} > create external table src (c1 string, c2 string, c3 string) partitioned by > (part string) > location '/tmp/src'; > {code} > > Create data file with data present only in 2 columns and separated by tab, > put it in table's external location > {code:java} > echo "d1\td2" >> data.txt; > hadoop dfs -put data.txt /tmp/src/part=part1/; > {code} > > – Recover data > {code:java} > MSCK REPAIR TABLE src;{code} > > – Alter partition's property to have field delimiter as tab ('\t') > {code:java} > ALTER TABLE src PARTITION (part='part1') > SET SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ('columns'='c1,c2', 'column.types' ='string,string', > 'field.delim'='\t'); > {code} > > – Now write the data from src table to a dest table > {code:java} > create table dest (c1 string, c2 string, c3 string, c4 string); > insert overwrite table dest select * from src; > {code} > > – Retrieve data from dest table > {code:java} > select * from dest; {code} > > *Result* (wrong)*:* > d1 d2 NULL NULL part1 > > – Now disable schema evolution, write data again from src table to dest table > and retrieve the data > {code:java} > set hive.exec.schema.evolution=false; > insert overwrite table dest select * from src; > select * from dest; > {code} > > *Result* (Correct)*:* > d1 d2 NULL part1 > > This is because "d1\td2" is getting considered as single column because the > filed delimiter used by deserialiser is *^A* instead of *\t* which is set at > partition level. > It is working fine if I alter the field delimiter of serde for the entire > table. > So, looks like serde properties in TableDesc is taking precedence over serde > properties in PartitionDesc. This issue is only when > hive.exec.schema.evolution is enabled (enabled by default) and its not there > in 2.x versions. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)