Re: Query Optimization in Hive

bharath vissapragada Tue, 01 Feb 2011 00:43:17 -0800

Hi ,

I updated the JIRA . Kindly give your suggestions  so that I can go
ahead and complete the task.


Thanks


On Tue, Feb 1, 2011 at 12:25 PM, bharath vissapragada
<bharathvissapragada1...@gmail.com> wrote:
> Thanks for replying namit..
>
> It is motivating to receive a mail from the authors of Hive :).
>
> I filed the jira based on the discussion..
> https://issues.apache.org/jira/browse/HIVE-1938
>
> I will try to update my idea asap.
>
> Thanks
> Bharath,V
> 4th year Undergrad,IIIT Hyderabad.
> w: http://research.iiit.ac.in/~bharath.v
>
>
>
> On Tue, Feb 1, 2011 at 11:46 AM, Namit Jain <nj...@fb.com> wrote:
>> Bharath,
>>
>> This would be great.
>>
>> Why don¹t you write up something about how you are planning to proceed ?
>> File a new jira and load some design notes/spec. there.
>> We can definitely sync up. from there.
>>
>>
>> This feature would be very useful to the community - We, at facebook,
>> Would definitely like to use it.
>>
>>
>> Thanks,
>> -namit
>>
>>
>> On 1/31/11 9:50 PM, "bharath vissapragada"
>> <bharathvissapragada1...@gmail.com> wrote:
>>
>>>Hi Ning,Anja,
>>>
>>>I am doing my Masters thesis on this topic . I have implemented all
>>>SQL features like joins , selects etc on top of Hadoop (before knowing
>>>about Hive) and we have derived some basic cost-models for join
>>>re-ordering which seem to be working fine on some basic scales of TPCH
>>>datasets .. Later I came to know about Hive and I am trying to
>>>implement the same in Hive .
>>>
>>>Right now I am in the process of understanding Hive's source and I am
>>>almost done with  "ql" package. I think it would be great if you guys
>>>can help us in this regard .. I am a bit confused about the
>>>implementation of joins and once i'm done with that , I can modify the
>>>"joinReorder" of Optimizer package by using the cost-formulae and
>>>metadata. It would be a great opportunity to work with you guys at fb
>>>and contribute to Hive..
>>>
>>>Thanks
>>>Bharath,V
>>>4th year Undergrad,IIIT Hyderabad.
>>>w: http://research.iiit.ac.in/~bharath.v
>>>
>>>On Tue, Feb 1, 2011 at 9:22 AM, Ning Zhang <nzh...@fb.com> wrote:
>>>> Hi Anja,
>>>>
>>>> As you noticed Hive only have limited supports for cost-baesd
>>>>optimization. One of the reasons is that Hive used to have very small
>>>>number of optional execution plans to choose from. One exception is
>>>>mapjoin vs common joins. Liying Tang had some work on his last intern to
>>>>convert common joins to mapjoin in a rule-based fashion. One of his
>>>>future works is to automatically convert common join to mapjoins based
>>>>on stats. There are also ongoing work on indexes on Hive. With the
>>>>support of indexes, CBO will be much needed.
>>>>
>>>> In order for a decent CBO to work, we need stats and cost models. There
>>>>are some work in stats. Table/partition level stats has already been
>>>>supported. There is a JIRA open for column level stats (HIVE-1362). Cost
>>>>model is much more complex in Hadoop environment and closely dependent
>>>>on the mapjoin/index implementations. Given al these in place, we can
>>>>then talk about plan enumeration etc.
>>>>
>>>> So yes, we are interested in CBO, but it is a large area and many
>>>>missing pieces need to be filled in Hive. If you have particular
>>>>interest in some area, you can propose your ideas in
>>>>hive-...@hive.apache.org mailing list or even apply for an intern at FB
>>>>if you would like to work closely with us.
>>>>
>>>> Thanks,
>>>> Ning
>>>>
>>>> On Jan 31, 2011, at 2:04 PM, Anja Gruenheid wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I'm a graduate student from Georgia Tech and I'm working with Hive for
>>>>>a research project. I am interested in query optimization and the Hive
>>>>>MetaStore in that context. Working through the documentation and code,
>>>>>I noticed that the implementation right now is using a rule-based
>>>>>optimization system. Therefore, I was wondering whether cost-based
>>>>>query optimization will be a future task in the development of Hive and
>>>>>if it would be possible for me to cooperate with the developers of Hive
>>>>>to advance the project in general.
>>>>>
>>>>> Best regards,
>>>>> Anja Gruenheid
>>>>
>>>>
>>
>>
>

Re: Query Optimization in Hive

Reply via email to