Hi Alexis, Work is already underway to add the YSmart optimizer to Hive. Please take a look at https://issues.apache.org/jira/browse/HIVE-2206.
Thanks. Carl On Wed, Feb 8, 2012 at 6:17 PM, Alexis De La Cruz Toledo < alexis...@gmail.com> wrote: > Hi! My name is Alexis. I am a master student of Cinvestav, DF, México. > Actually I am doing my thesis work and I would like to participate in > Google Summer of Code 2012 ( > http://google-melange.appspot.com/gsoc/events/google/gsoc2012) > I'm interesting in improve Hive and I have been studying hadoop and hive. > > I have interesting about the tree plan generated by Hive. > Call me the attention that Hive read many times the same table > and generate many jobs hadoop when the query can be > expressed in less Jobs and with only one read of the table > if I program the same query in hadoop. > > I think that I can reduce the number of jobs to process a query > and read the tables one time too, no matter if used it on several jobs. > > The solution could be raised of two ways: > > 1. Changing the part when the DAG is created, making the optimizations in > this moment. > 2. After that the DAG is created, we can apply the optimizations, this > optimizations can be implemented in another class. > > Where could I do this? I think that the method that compile the queries is > the method compile in class Driver, am I right? > Can someone guide me where I could implement it? > > There is a paper which discussed what I say > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > We can take it and improve or implement us own ideas. > > Personally I would like to do the second options due to time. > > By another hand, Someone is interested to work with me and be my mentor in > Google Summer Code 2012? > > Thanks. > > Regards. > > -- > Ing. Alexis de la Cruz Toledo. > *Av. Instituto Politécnico Nacional No. 2508 Col. San Pedro Zacatenco. > México, > D.F, 07360 * > *CINVESTAV, DF.* >