[ https://issues.apache.org/jira/browse/HIVE-11373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
eugeny birukov updated HIVE-11373: ---------------------------------- Description: I try transform json string to Map<STRING,STRING> using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING, STRING>) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} was: I try transform json string to Map<STRING,STRING> using python code import sys,re for d in sys.stdin: r=d.replace('{','').replace('}','').replace('"','') r=re.sub('[:,]', '\003', r) print r.strip() echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath '/tmp/json.txt' overwrite into table json;" hive -e "CREATE TABLE d(jsondata MAP<STRING, STRING>); SELECT TRANSFORM (jsonStr) USING 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING, STRING>) FROM json;" converting to local s3://webgames-emr/hive/restore/json2map.py Added resources: [s3://webgames-emr/hive/restore/json2map.py] Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1437833808701_0006, Tracking URL = http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1437833808701_0006 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: 25 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK {"key1":"valu1\u0003key2\u0003value2"} Time taken: 48.878 seconds, Fetched: 1 row(s) Expected Result {"key1":"valu1","key2":"value2"} Actual Result {"key1":"valu1\u0003key2\u0003value2"} > Incorrect (de)serialization STRING field to MAP<STRING,STRING> in TRANSFORM > operation > -------------------------------------------------------------------------------------- > > Key: HIVE-11373 > URL: https://issues.apache.org/jira/browse/HIVE-11373 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.1, 1.0.0 > Environment: Amazon EMR (AMI 3.8 with HIVE 0.13.1, emr-4.0.0 with > HIVE 1.0) > Reporter: eugeny birukov > > I try transform json string to Map<STRING,STRING> using python code > import sys,re > for d in sys.stdin: > r=d.replace('{','').replace('}','').replace('"','') > r=re.sub('[:,]', '\003', r) > print r.strip() > echo '{"key1":"valu1","key2":"value2"}' > /tmp/json.txt; > hive -e "CREATE TABLE json(jsonStr STRING); load data local inpath > '/tmp/json.txt' overwrite into table json;" > hive -e "SELECT TRANSFORM (jsonStr) USING > 's3://webgames-emr/hive/restore/json2map.py' AS (parsedjson MAP<STRING, > STRING>) FROM json;" > converting to local s3://webgames-emr/hive/restore/json2map.py > Added resources: [s3://webgames-emr/hive/restore/json2map.py] > Query ID = hadoop_20150725150000_46c48f7d-92c6-41d7-9c54-a90d5b351722 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1437833808701_0006, Tracking URL = > http://ip-172-31-11-47.ec2.internal:20888/proxy/application_1437833808701_0006/ > Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1437833808701_0006 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0 > 2015-07-25 15:01:16,773 Stage-1 map = 0%, reduce = 0% > 2015-07-25 15:01:34,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 > sec > MapReduce Total cumulative CPU time: 1 seconds 960 msec > Ended Job = job_1437833808701_0006 > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 261 HDFS Write: > 25 SUCCESS > Total MapReduce CPU Time Spent: 1 seconds 960 msec > OK > {"key1":"valu1\u0003key2\u0003value2"} > Time taken: 48.878 seconds, Fetched: 1 row(s) > Expected Result {"key1":"valu1","key2":"value2"} > Actual Result {"key1":"valu1\u0003key2\u0003value2"} -- This message was sent by Atlassian JIRA (v6.3.4#6332)