Hi It worked for me when adding *"field.delim"='\t'* to SERDEPROPERTIES in lieu of *terminated by '\t'*. (this after looking at LazySimpleSerde <https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java#L417>, LazySerDeParameters <https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySerDeParameters.java#L240>, and serdeConstants <https://github.com/apache/hive/blob/master/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java#L59>).
However, it also worked when I added "field.delim" to TBLPROPERTIES instead of SERDEPROPERTIES. So maybe you can try: create ... row format delimited fields terminated by '\t' ... TBLPROPERTIES("serialization.encoding"='GBK',....); HTH (and let me know if it works) Gabriel Balan On 12/10/2015 5:53 PM, mahender bigdata wrote:
Hi, I need help in reading UniCode file , I have created external table on top of my file CREATE External TABLE IF NOT EXISTS table1(`CC` string,`SRT` string ,`P C` string ,`Year` string ,`Month` string,`Address` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES("serialization.encoding"='GBK') STORED AS TEXTFILE LOCATION 'XX' tblproperties("serialization.null.format"="","skip.header.line.count"="1") ; where I'm using "serialization.encoding"='GBK' has encoding and ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'. Is there a way to include my field terminated by '\t' like row format delimited fields terminated by '\t' along with encoding. My Second Question is there any serde available for encoding 'UCS-2 BE BOM', or any universal serde to accept any encoding. Thanks in advance.
-- The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.