I assume the reason for this is that the Hive compiler has no way of 
determining that the 'day' that is input into the transform script is the same 
'day' that is output from the transform script. Even if it did, its unclear if 
pushing down would be legal without knowing the semantics of the 
transformation. Any optimization to be done here will likely need an annotation 
somewhere to say that certain columns in the output of a transform refer to 
specific columns in the input of a transform for predicate push down purposes 
(and that such pushdown is legal for this transformation)

thanks,
Shrikanth
On Oct 10, 2012, at 12:04 PM, John Omernik wrote:

> Greetings all, I am trying to incorporate a TRANSFORM into a view (so we can 
> abstract the transform script away from the user)
> 
> 
> 
> As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) with 
> lots of partitions
> 
> and I tried this
> 
> CREATE VIEW view_transform as
> Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table;
> 
> The reason I used 'cat' in my test is if this works, I will distribute my 
> transform scripts to each node manually, I know each node has cat, so this 
> works as a test. 
> 
> When run 
> 
> SELECT * from view_transform where day = '2012-10-08'  10,432 map tasks get 
> spun up. 
> 
> If I rewrite the view to be
> 
> CREATE VIEW view_transform as
> Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table where 
> day = '2012-10-08';
> 
> Then only 16 map tasks get spun up (the desired behavior, but the pruning is 
> happening in the view not in the query)
> 
> Thus I wanted input on whether this should be considered a bug.  I.e. Should 
> we be able to define a partition spec in a view that uses a transform that 
> allows normal pruning to occur even though the partition spec will be passed 
> to the transfrom script?  I think we should, and it's likely doable some how. 
> This would be awesome for a number of situations where you may want to expose 
> "transformed" data to analysis without the mess of having them format their 
> script for transform. 
> 
> 

Reply via email to