Hi

It worked for me when adding *"field.delim"='\t'* to SERDEPROPERTIES in lieu of 
*terminated by '\t'*.
(this after looking at LazySimpleSerde 
<https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java#L417>,
 LazySerDeParameters 
<https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySerDeParameters.java#L240>,
 and serdeConstants 
<https://github.com/apache/hive/blob/master/serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java#L59>).

However, it also worked when I added "field.delim" to TBLPROPERTIES instead of 
SERDEPROPERTIES. So maybe you can try:

   create ... row format delimited fields terminated by '\t' ...
   TBLPROPERTIES("serialization.encoding"='GBK',....);


HTH (and let me know if it works)
Gabriel Balan

On 12/10/2015 5:53 PM, mahender bigdata wrote:
Hi,

I need help in reading UniCode file , I have created external table on top of 
my file

CREATE External TABLE IF NOT EXISTS table1(`CC` string,`SRT` string ,`P C` string ,`Year` string ,`Month` string,`Address` 
string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH 
SERDEPROPERTIES("serialization.encoding"='GBK')  STORED AS TEXTFILE LOCATION 'XX' 
tblproperties("serialization.null.format"="","skip.header.line.count"="1") ;

 where I'm using "serialization.encoding"='GBK' has encoding  and ROW FORMAT 
SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'. Is there a way to include my 
field terminated by '\t' like  row format delimited fields terminated by '\t' along with 
encoding.

My Second Question is there any serde available for encoding 'UCS-2 BE BOM', or 
any universal serde to accept any encoding.



Thanks in advance.



--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.

Reply via email to