Created branch vectorization for this dev work.
http://svn.apache.org/repos/asf/hive/branches/vectorization/


Ashutosh


On Tue, Apr 9, 2013 at 8:50 PM, Ashutosh Chauhan <hashut...@apache.org>wrote:

> Sounds good. I will create a branch soon.
>
> Thanks,
> Ashutosh
>
>
> On Mon, Apr 8, 2013 at 7:31 PM, Namit Jain <nj...@fb.com> wrote:
>
>> Sounds good to me
>>
>>
>> On 4/9/13 12:04 AM, "Jitendra Pandey" <jiten...@hortonworks.com> wrote:
>>
>> >I agree that we shouldn't wait too long before merging the branch.
>> >We are targeting to have basic queries working within a month from now
>> and
>> >will definitely propose to merge the branch back into trunk at that
>> point.
>> >We will limit the scope of the work on the branch to just a few operators
>> >and primitive datatypes. Does that sound reasonable?
>> >
>> >regards
>> >jitendra
>> >
>> >On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <nj...@fb.com> wrote:
>> >
>> >> There is no right answer, but I feel if you go this path a long way, it
>> >> will be very difficult
>> >> to merge back. Given that this is not a new functionality, and
>> >>improvement
>> >> to existing code
>> >> (which will also evolve), it will become difficult to maintain/review a
>> >> big diff in the future.
>> >>
>> >> I haven't thought much about it, but can start by creating the
>> >>high-level
>> >> interfaces first, and then
>> >> going from there. For e.g.: create interfaces for operators which take
>> >>in
>> >> an array of rows instead of
>> >> a single row - initially the array size can always be 1. Now, proceed
>> >>from
>> >> there.
>> >>
>> >> What makes you think, merging a branch 6 months/1 year from now will be
>> >> easier than working on the
>> >> current branch ?
>> >>
>> >> Having said that, both approaches can be made to work - but I think you
>> >> are just delaying the
>> >> merging work instead of taking the hit upfront.
>> >>
>> >> Thanks,
>> >> -namit
>> >>
>> >>
>> >>
>> >> On 4/4/13 2:40 AM, "Jitendra Pandey" <jiten...@hortonworks.com> wrote:
>> >>
>> >> >   We did consider implementing these changes on the trunk. But, it
>> >>would
>> >> >take several patches in various parts of the code before a simple end
>> >>to
>> >> >end query can be executed on vectorized path. For example a patch for
>> >> >vectorized expressions  will be a significant amount of code, but will
>> >>not
>> >> >be used in a query until a vectorized operator is implemented and the
>> >> >query
>> >> >plan is modified to use the vectorized path. Vectorization of even
>> >>basic
>> >> >expressions becomes non trivial because we need to optimize for
>> various
>> >> >cases like chain of expressions, for non-null columns or repeating
>> >>values
>> >> >and also handle case for nullable columns, or short circuit
>> >>optimization
>> >> >etc. Careful handling of these is important for performance gains.
>> >> >
>> >> > Committing those intermediate patches in trunk  without stabilizing
>> >>them
>> >> >in a branch first might be a cause of concern.
>> >> >
>> >> >  A separate branch will let us make incremental changes to the system
>> >>so
>> >> >that each patch addresses a single feature or functionality and is
>> >>small
>> >> >enough to review.
>> >> >   We will make sure that the branch is frequently updated with the
>> >> >changes
>> >> >in the trunk to avoid conflicts at the time of the merge.
>> >> >  Also, we plan to propose merger of the branch as soon as a basic end
>> >>to
>> >> >end query begins to work and is sufficiently tested, instead of
>> waiting
>> >> >for
>> >> >all operators to get vectorized. Initially our target is to make
>> select
>> >> >and
>> >> >filter operators work with vectorized expressions for primitive types.
>> >> >
>> >> >   We will have a single global configuration flag that can be used to
>> >> >turn
>> >> >off the entire vectorization code path and we will specifically test
>> to
>> >> >make sure that when this flag is off there is no regression on the
>> >>current
>> >> >system. When vectorization is turned on, we will have a validation
>> >>step to
>> >> >make sure the given query is supported on the vectorization path
>> >>otherwise
>> >> >it will fall back to current code path.
>> >> >
>> >> >  Although, we intend to follow commit then review policy on the
>> branch
>> >> >for
>> >> >speed of development, each patch will have an associated jira and will
>> >>be
>> >> >available for review and feedback.
>> >> >
>> >> >thanks
>> >> >jitendra
>> >> >
>> >> >On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <nj...@fb.com> wrote:
>> >> >
>> >> >> It will be difficult to merge back the branch.
>> >> >> Can you stage your changes incrementally ?
>> >> >>
>> >> >> I mean, start with the making the operators vectorized - it can be a
>> >>for
>> >> >> loop to
>> >> >> start with ? I think it will be very difficult to merge it back if
>> we
>> >> >> diverge on this.
>> >> >> I would recommend starting with simple interfaces for operators and
>> >>then
>> >> >> plugging them
>> >> >> in slowly instead of a new branch, unless this approach is extremely
>> >> >> difficult.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> -namit
>> >> >>
>> >> >> On 4/3/13 1:52 AM, "Jitendra Pandey" <jiten...@hortonworks.com>
>> >>wrote:
>> >> >>
>> >> >> >Hi Folks,
>> >> >> >     I want to propose for creation of a separate branch for
>> >>HIVE-4160
>> >> >> >work. This is a significant amount of work, and support for very
>> >>basic
>> >> >> >functionality will need big chunks of code. It will also take some
>> >> >>time to
>> >> >> >stabilize and test. A separate dev branch will allow us to do this
>> >>work
>> >> >> >incrementally and collaboratively. We have already uploaded a
>> design
>> >> >> >document on the jira for comments/feedback.
>> >> >> >
>> >> >> >thanks
>> >> >> >jitendra
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> ><http://hortonworks.com/download/>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >--
>> >> ><http://hortonworks.com/download/>
>> >>
>> >>
>> >
>> >
>> >--
>> ><http://hortonworks.com/download/>
>>
>>
>

Reply via email to