Venki Korukanti created HIVE-12680: -------------------------------------- Summary: Binary type partition column values are incorrectly serialized and deserialized Key: HIVE-12680 URL: https://issues.apache.org/jira/browse/HIVE-12680 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Venki Korukanti Priority: Minor
Here are the repro steps: {code} CREATE TABLE kv_binary(key INT, value STRING) PARTITIONED BY (binary_part BINARY); INSERT INTO TABLE kv_binary PARTITION (binary_part='somevalue') SELECT * FROM kv LIMIT 1; Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Job running in-process (local Hadoop) 2015-12-15 13:34:15,758 Stage-1 map = 100%, reduce = 100% Ended Job = job_local1142919541_0001 Loading data to table default.kv_binary partition (binary_part=[B@15871) Partition default.kv_binary{binary_part=[B@15871} stats: [numFiles=1, numRows=1, totalSize=13, rawDataSize=12] MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 8192 HDFS Write: 11733 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK {code} Partition created has java object reference as value in FileSystem: {code} hadoop fs -ls /user/hive/warehouse/kv_binary Found 1 items drwxr-xr-x - hadoop supergroup 0 2015-12-15 13:34 /user/hive/warehouse/kv_binary/binary_part=%5BB@15871 {code} Selecting from the same table: {code} hive> SELECT * FROM kv_binary; OK 238 val/238= [B@15871 {code} This makes the binary partitions unusable, but binary partitions doesn't seem to be commonly used. Logging the bug for tracking purposes. Seems like somewhere are calling the toString on byte[]. BTW, this is working fine in Hive 1.0.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)