Hello, Lately our user base has increased so the input files have increased considerably in size and number.
One of our processing steps is doing a query of the form found at the end of the email. My problem is that apparently, sometimes, the processing misses some of the input files (for the 2nd select in most cases). I'm using Hive 0.6, Hadoop 0.20.2 on a Debian 5 64bit and we are connecting to a hive server instance using JDBC. Any idea on what parameters i could tune of any tickets that have been opened on this problem? I searched the Hive JIRA for nothing until now... The only thing that i think might be related is https://issues.apache.org/jira/browse/HIVE-1884 SELECT t.a, sum(t.b), sum(t.c), sum(t.d) FROM ( SELECT a, sum(x) as b, sum(y) as c, sum(z) as d FROM T1 WHERE ... GROUP BY ... UNION ALL SELECT a, sum(x) as b, sum(y) as c, sum(z) as d FROM T2 WHERE ... GROUP BY ... UNION ALL SELECT a, sum(x) as b, sum(y) as c, sum(z) as d FROM T3 WHERE ... GROUP BY ... ) t GROUP BY ... -- Florin