Hey All,

I am using Hive 2.0 with external meta-store on EMR-5.0.0 and TEZ as
execution engine.
Our data are stored in json format so for serialization and deserialization
purpose we are planning to use lazy serde
(classname is  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' ).

My table definition is

CREATE EXTERNAL TABLE IF NOT EXISTS
daily_active_users_summary_json_partition_dt_paths_v1
(uid string, city string, user string, songcount string, songid_list
array<string>  ) PARTITIONED BY ( dt string)

 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

 WITH SERDEPROPERTIES ('paths'='uid,city,user,songcount,songid_list')

 LOCATION 's3://<bucketname removed>/users/daily_active_
users_summary_json_partition_dt';


and data look like this---

{"uid":"xxxxxxyyyy","listening_user_flag":"non_listening","platform":"android","model":"micromax
a110q","aquisition_channel":"organic","state":"delhi","app_
version":"3.2:","country":"IN","city":"new delhi","new_listening_user_
flag":"non_listening","manufacturer":"Micromax","
login_mode":"loggedout","new_user_flag":"returning","digital_channel":"Not
Source"}


Note: I have pasted here one record in table.


Now, When I do query

select * from daily_active_users_summary_json_partition_dt_paths_v1 limit 5;


the first field of table takes the complete record and rest of field are
showing to be NULL.

When I use different serde  'org.apache.hive.hcatalog.data.JsonSerDe'

then I can see the above query works fine and able to serialize data
perfectly fine. We want to user the lazy serde because our data contains
non-utf-8 character and the later serde does not support non-utf-8
character serialization/deserialization.


Can you please help me solve this, we mostly want to use lazy serde only as
we have already experimented with other serde's none of them is working for
us Is there any configuration which enable serialization/deserialization
while using lazy Serde.

Or is there any other serde which can fine process non-utf-8 character in
hive-2 and tez.

Thank you


Best Regards,
Dana Ram Meghwal
Software Engineer
dana...@saavn.com

Reply via email to