Hey All, I am using Hive 2.0 with external meta-store on EMR-5.0.0 and TEZ as execution engine. Our data are stored in json format so for serialization and deserialization purpose we are planning to use lazy serde (classname is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' ).
My table definition is CREATE EXTERNAL TABLE IF NOT EXISTS daily_active_users_summary_json_partition_dt_paths_v1 (uid string, city string, user string, songcount string, songid_list array<string> ) PARTITIONED BY ( dt string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('paths'='uid,city,user,songcount,songid_list') LOCATION 's3://<bucketname removed>/users/daily_active_ users_summary_json_partition_dt'; and data look like this--- {"uid":"xxxxxxyyyy","listening_user_flag":"non_listening","platform":"android","model":"micromax a110q","aquisition_channel":"organic","state":"delhi","app_ version":"3.2:","country":"IN","city":"new delhi","new_listening_user_ flag":"non_listening","manufacturer":"Micromax"," login_mode":"loggedout","new_user_flag":"returning","digital_channel":"Not Source"} Note: I have pasted here one record in table. Now, When I do query select * from daily_active_users_summary_json_partition_dt_paths_v1 limit 5; the first field of table takes the complete record and rest of field are showing to be NULL. When I use different serde 'org.apache.hive.hcatalog.data.JsonSerDe' then I can see the above query works fine and able to serialize data perfectly fine. We want to user the lazy serde because our data contains non-utf-8 character and the later serde does not support non-utf-8 character serialization/deserialization. Can you please help me solve this, we mostly want to use lazy serde only as we have already experimented with other serde's none of them is working for us Is there any configuration which enable serialization/deserialization while using lazy Serde. Or is there any other serde which can fine process non-utf-8 character in hive-2 and tez. Thank you Best Regards, Dana Ram Meghwal Software Engineer dana...@saavn.com