I did try nesting, the problem is that I am trying to do it in a view and I think something gets lost in translation...
On Thu, Oct 11, 2012 at 8:32 AM, Edward Capriolo <[email protected]>wrote: > Have you considered rewriting the query using nested from clauses. > Generally if hive is not 'pushing down' as you would assume nesting froms > make the query happen in a specific way. > > > On Wednesday, October 10, 2012, John Omernik <[email protected]> wrote: > > Agreed. That's the conclusion we came to as well. So it's less of a bug > and more of a feature request. I think one of the main advantages of hive > is the flexibility in allowing non-technical users to run basic queries > without having to think about the transform stuff. (i.e. we in the IT shop > can setup the transform) I like the annotation idea that some how the > partition specs can be pushed through (identified in some other way etc). > I am new to the Apache/JIRA world, what would you recommend for getting > this into a feature request for consideration? I am not a Java programmer, > so my idea may need to be paired with a champion to help implement it :) > > > > > > On Wed, Oct 10, 2012 at 3:24 PM, shrikanth shankar <[email protected]> > wrote: > >> > >> I assume the reason for this is that the Hive compiler has no way of > determining that the 'day' that is input into the transform script is the > same 'day' that is output from the transform script. Even if it did, its > unclear if pushing down would be legal without knowing the semantics of the > transformation. Any optimization to be done here will likely need an > annotation somewhere to say that certain columns in the output of a > transform refer to specific columns in the input of a transform for > predicate push down purposes (and that such pushdown is legal for this > transformation) > >> > >> thanks, > >> Shrikanth > >> On Oct 10, 2012, at 12:04 PM, John Omernik wrote: > >> > >> > Greetings all, I am trying to incorporate a TRANSFORM into a view (so > we can abstract the transform script away from the user) > >> > > >> > > >> > > >> > As a Test, I have a table partitioned on day (in YYYY-MM-DD formated) > with lots of partitions > >> > > >> > and I tried this > >> > > >> > CREATE VIEW view_transform as > >> > Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table; > >> > > >> > The reason I used 'cat' in my test is if this works, I will > distribute my transform scripts to each node manually, I know each node has > cat, so this works as a test. > >> > > >> > When run > >> > > >> > SELECT * from view_transform where day = '2012-10-08' 10,432 map > tasks get spun up. > >> > > >> > If I rewrite the view to be > >> > > >> > CREATE VIEW view_transform as > >> > Select TRANSFORM (day, ip) using 'cat' as (day, ip) from source_table > where day = '2012-10-08'; > >> > > >> > Then only 16 map tasks get spun up (the desired behavior, but the > pruning is happening in the view not in the query) > >> > > >> > Thus I wanted input on whether this should be considered a bug. I.e. > Should we be able to define a partition spec in a view that uses a > transform that allows normal pruning to occur even though the partition > spec will be passed to the transfrom script? I think we should, and it's > likely doable some how. This would be awesome for a number of situations > where you may want to expose "transformed" data to analysis without the > mess of having them format their script for transform. > >> > > >> > > >> > > > > >
