Hey Zoltan, Thanks for the response. When I call "select version()" I get:
3.1.0.3.1.4.0-315 re8d79f440455fa4400daf79974666b3055f1730f So a couple of patch versions old - any idea if this was a known bug before? Regards, Nakul From: user@hive.apache.org At: 02/12/20 12:31:33To: Nakul Khanna (BLOOMBERG/ LONDON ) , user@hive.apache.org Cc: Jacky Lee (BLOOMBERG/ PRINCETON ) , He Chen (BLOOMBERG/ PRINCETON ) , Peter Babinski (BLOOMBERG/ PRINCETON ) , Bernat Gabor (BLOOMBERG/ LONDON ) , Shashank Singh (BLOOMBERG/ PRINCETON ) Subject: Re: Issues with aggregating on map values Hey Nakul! It's not clear which version you are using; I've checked this issue on apache/master and the 3.1.2 release - and both of them returned accurate results. You could execute: 'select version()' ; or run 'hive --version' in a commandline cheers, Zoltan On 2/11/20 11:38 AM, Nakul Khanna (BLOOMBERG/ LONDON) wrote: > Creating the table > > CREATE TABLE foo > (tags MAP<String, String>, size int); > INSERT INTO foo VALUES > (map("country", "USA"), 10), > (map("country", "USA"), 20), > (map("country", "UK"), 30); > > SELECT * FROM foo > > +--------------------+-----------+ > | foo.tags | foo.size | > +--------------------+-----------+ > | {"country":"USA"} | 10 | > | {"country":"USA"} | 20 | > | {"country":"UK"} | 30 | > +--------------------+-----------+ > > Aggregating the Table > > SELECT DISTINCT tags["country"] from foo; > > +-------+ > | _c0 | > +-------+ > | USA | > | NULL | > +-------+ > > SELECT tags["country"], sum(size) FROM foo GROUP BY tags["country"]; > > +-------+------+ > | _c0 | _c1 | > +-------+------+ > | USA | 10 | > | NULL | 50 | > +-------+------+ > > And even more strangely, with a subquery: > > SELECT flattened.country, sum(flattened.size) > FROM ( > SELECT tags["country"] as country, size > FROM foo > WHERE tags["country"] IS NOT NULL > ) as flattened > GROUP BY flattened.country; > > +--------------------+------+ > | flattened.country | _c1 | > +--------------------+------+ > | USA | 10 | > +--------------------+------+ > > --------------- > > Is there any way to stop this from happening and get the correct aggregation behaviour? The only method I've found is to create a new table using the query, write that to > disk and then do the aggregation on that.