Hey Zoltan,

Thanks for the response. When I call "select version()" I get:

3.1.0.3.1.4.0-315 re8d79f440455fa4400daf79974666b3055f1730f 

So a couple of patch versions old - any idea if this was a known bug before?

Regards,
Nakul

From: user@hive.apache.org At: 02/12/20 12:31:33To:  Nakul Khanna (BLOOMBERG/ 
LONDON ) ,  user@hive.apache.org
Cc:  Jacky Lee (BLOOMBERG/ PRINCETON ) ,  He Chen (BLOOMBERG/ PRINCETON ) ,  
Peter Babinski (BLOOMBERG/ PRINCETON ) ,  Bernat Gabor (BLOOMBERG/ LONDON ) ,  
Shashank Singh (BLOOMBERG/ PRINCETON ) 
Subject: Re: Issues with aggregating on map values

Hey Nakul!

It's not clear which version you are using;
I've checked this issue on apache/master and the 3.1.2 release - and both of 
them returned accurate results.
You could execute: 'select version()' ; or run 'hive --version' in a commandline

cheers,
Zoltan

On 2/11/20 11:38 AM, Nakul Khanna (BLOOMBERG/ LONDON) wrote:
> Creating the table
> 
> CREATE TABLE foo
> (tags MAP<String, String>, size int);
> INSERT INTO foo VALUES
> (map("country", "USA"), 10),
> (map("country", "USA"), 20),
> (map("country", "UK"), 30);
> 
> SELECT * FROM foo
> 
> +--------------------+-----------+
> | foo.tags | foo.size |
> +--------------------+-----------+
> | {"country":"USA"} | 10 |
> | {"country":"USA"} | 20 |
> | {"country":"UK"} | 30 |
> +--------------------+-----------+
> 
> Aggregating the Table
> 
> SELECT DISTINCT tags["country"] from foo;
> 
> +-------+
> | _c0 |
> +-------+
> | USA |
> | NULL |
> +-------+
> 
> SELECT tags["country"], sum(size) FROM foo GROUP BY tags["country"];
> 
> +-------+------+
> | _c0 | _c1 |
> +-------+------+
> | USA | 10 |
> | NULL | 50 |
> +-------+------+
> 
> And even more strangely, with a subquery:
> 
> SELECT flattened.country, sum(flattened.size)
> FROM (
> SELECT tags["country"] as country, size
> FROM foo
> WHERE tags["country"] IS NOT NULL
> ) as flattened
> GROUP BY flattened.country;
> 
> +--------------------+------+
> | flattened.country | _c1 |
> +--------------------+------+
> | USA | 10 |
> +--------------------+------+
> 
> ---------------
> 
> Is there any way to stop this from happening and get the correct aggregation 
behaviour? The only method I've found is to create a new table using the query, 
write that to 
> disk and then do the aggregation on that.


Reply via email to