I am not a code expert, this looks very much like the bug I posted, but my bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any group by (probably not an issue)
Just to be clear, this is probably the same issue as mine, but if someone with more knowledge of the underlying structures were to see the OVERWRITE vs INTO they may see something different. On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans <philip.j.trom...@gmail.com>wrote: > This is a known (recently fixed) bug: > > https://issues.apache.org/jira/browse/HIVE-3699 > > Phil. > > > On 26 January 2013 15:17, John Omernik <j...@omernik.com> wrote: > >> I ran into an interesting bug. Basically, if your FROM() source is >> a partitioned table and you use a where clause that prunes, all of the >> INSERT HERE SELECT * WHERE x=y ignores each specified where clause. This >> does not occur if the source partition is not specified, but if the source >> as where partition = 'x' then the where on each individual insert is >> ignored... >> >> I've included some files here >> >> testdata.tsv - Tab delimited data to prove the issue >> create_tables.hive - Creates a database and tables as well as loads the >> data from the TSV >> >> Test Cases: >> I created these test case files in a way that there are three types of >> insert in each case: 1. Load all data from initial statement, 2. Load >> partial data (use a limiting clause such as where day >= '2013-01-05', and >> 3 Load NO data from the initial statement (where 1 = 0) >> >> These tests are all run on hive 0.9 >> >> multi-flat-flat.hive - The source table and the dest tables are not >> partitioned, the where clauses work as expected: >> >> 19 Rows loaded to multi_bug_flat >> 0 Rows loaded to multi_bug_flat3 >> 15 Rows loaded to multi_bug_flat2 >> >> multi-part-part.hive - The source table and the dest tables are >> partitioned. The where clauses are not honored. >> >> 9 Rows loaded to multi_bug_part3 >> 9 Rows loaded to multi_bug_part2 >> 9 Rows loaded to multi_bug_part >> >> multi-flat-part.hive - The source table is flat, the dest table is >> partitioned - The where clauses work as expected: >> >> 0 Rows loaded to multi_bug_part3 >> 15 Rows loaded to multi_bug_part2 >> 19 Rows loaded to multi_bug_part >> >> multi-part-flat.hive - The source table is partitioned, the dest table is >> flat - The where clauses are not honored: >> >> 9 Rows loaded to multi_bug_flat >> 9 Rows loaded to multi_bug_flat3 >> 9 Rows loaded to multi_bug_flat2 >> >> multi-part-specified.hive - The source and dest are partitioned, but >> there is no partition pruning statement in the from () this works as >> expected >> >> 0 Rows loaded to multi_bug_part3 >> 15 Rows loaded to multi_bug_part2 >> 19 Rows loaded to multi_bug_part >> >> >> Thoughts? >> > >