Hi all I've been working with hive for some time.
In my company, we use hive for querying on large datasets and found it's very easy to use. However we also found hive is lack of various charsets support so that we have to manually transform data files to utf-8 encoding before loading them into hive. So I have made a patch to make hive supports setting charset when creating a table. And the charset property will be used by SerDe when it serialize or deserialize data. The modified hql is like: CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS TERMINATED BY '\t'; I'm very happy to contribute this to the community and looking forward to your feedbacks. Thanks, Kai Zhang