Hi, Some background information:
We have a really large base table (~1P) — (lets call it raw_table) partitioned like yyyy_mm_dd=2014-02-11/hh=12/flag=1 (or yyyy_mm_dd=2014-02-11/hh=12/flag=0) I’m testing creating a view on that table to be able to have a consistent user accessible interface instead of letting the users to use the actual table (we’ve changed the format of the table multiple times in the past and we’ll possibly do so again). I am creating the view like this[1] and I’ve also tested view partitions[2]. The version of Hive is 0.10.0 The issues I came across so far are: 1) I couldn’t figure out the actual purpose of the view partitions. The document[3] lists several use cases and mentioned “infer view partitions automatically based on the partitions of the underlying tables" which leads to an open ticket[4] which in turn lists another one (HIVE-1941) which is closed. However I fail to see how to create a derived view partition. And with the definition I’ve used[2] they don’t seem to have the same effect. 2) You need to specify the partitions in the query to the view explicitly to have the partition pruning come into effect. However, when I say “select * from raw_table limit 10” it selects a subset somehow (explain and explain extended does not show any targeted partitions with such queries — is this a bug?). But doing the same query on the view leads to a full table scan. 2.1) I see a similar behaviour with things like “select ... from myview tablesample (bucket 1 out of $bucketsize on $field) I’m wondering if this is just the way the views work at the moment or is there a workaround to improve, lets say: silly queries? As the interface seems inconsistent between the table and view access. Thanks, Burak =========[1]========= CREATE VIEW myview AS SELECT `combined`.`column1`, `combined`.`column2`, `combined`.`yyyy_mm_dd`, `combined`.`hh`, `combined`.`column3`, `combined`.`column4`, `combined`.`column5` FROM ( SELECT `raw`.`column1`, `raw`.`column2`, `raw`.`yyyy_mm_dd`, `raw`.`hh`, `raw`.`column3`, `raw`.`column4`, `raw`.`column5` FROM `default`.`raw_table` `raw` ) `combined` =========[2]========= CREATE VIEW myview PARTITIONED ON ( yyyy_mm_dd, hh, type, persona, dc ) AS ( ... ) =========[3]=========[ https://cwiki.apache.org/confluence/display/Hive/PartitionedViews