Created branch vectorization for this dev work. http://svn.apache.org/repos/asf/hive/branches/vectorization/
Ashutosh On Tue, Apr 9, 2013 at 8:50 PM, Ashutosh Chauhan <hashut...@apache.org>wrote: > Sounds good. I will create a branch soon. > > Thanks, > Ashutosh > > > On Mon, Apr 8, 2013 at 7:31 PM, Namit Jain <nj...@fb.com> wrote: > >> Sounds good to me >> >> >> On 4/9/13 12:04 AM, "Jitendra Pandey" <jiten...@hortonworks.com> wrote: >> >> >I agree that we shouldn't wait too long before merging the branch. >> >We are targeting to have basic queries working within a month from now >> and >> >will definitely propose to merge the branch back into trunk at that >> point. >> >We will limit the scope of the work on the branch to just a few operators >> >and primitive datatypes. Does that sound reasonable? >> > >> >regards >> >jitendra >> > >> >On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <nj...@fb.com> wrote: >> > >> >> There is no right answer, but I feel if you go this path a long way, it >> >> will be very difficult >> >> to merge back. Given that this is not a new functionality, and >> >>improvement >> >> to existing code >> >> (which will also evolve), it will become difficult to maintain/review a >> >> big diff in the future. >> >> >> >> I haven't thought much about it, but can start by creating the >> >>high-level >> >> interfaces first, and then >> >> going from there. For e.g.: create interfaces for operators which take >> >>in >> >> an array of rows instead of >> >> a single row - initially the array size can always be 1. Now, proceed >> >>from >> >> there. >> >> >> >> What makes you think, merging a branch 6 months/1 year from now will be >> >> easier than working on the >> >> current branch ? >> >> >> >> Having said that, both approaches can be made to work - but I think you >> >> are just delaying the >> >> merging work instead of taking the hit upfront. >> >> >> >> Thanks, >> >> -namit >> >> >> >> >> >> >> >> On 4/4/13 2:40 AM, "Jitendra Pandey" <jiten...@hortonworks.com> wrote: >> >> >> >> > We did consider implementing these changes on the trunk. But, it >> >>would >> >> >take several patches in various parts of the code before a simple end >> >>to >> >> >end query can be executed on vectorized path. For example a patch for >> >> >vectorized expressions will be a significant amount of code, but will >> >>not >> >> >be used in a query until a vectorized operator is implemented and the >> >> >query >> >> >plan is modified to use the vectorized path. Vectorization of even >> >>basic >> >> >expressions becomes non trivial because we need to optimize for >> various >> >> >cases like chain of expressions, for non-null columns or repeating >> >>values >> >> >and also handle case for nullable columns, or short circuit >> >>optimization >> >> >etc. Careful handling of these is important for performance gains. >> >> > >> >> > Committing those intermediate patches in trunk without stabilizing >> >>them >> >> >in a branch first might be a cause of concern. >> >> > >> >> > A separate branch will let us make incremental changes to the system >> >>so >> >> >that each patch addresses a single feature or functionality and is >> >>small >> >> >enough to review. >> >> > We will make sure that the branch is frequently updated with the >> >> >changes >> >> >in the trunk to avoid conflicts at the time of the merge. >> >> > Also, we plan to propose merger of the branch as soon as a basic end >> >>to >> >> >end query begins to work and is sufficiently tested, instead of >> waiting >> >> >for >> >> >all operators to get vectorized. Initially our target is to make >> select >> >> >and >> >> >filter operators work with vectorized expressions for primitive types. >> >> > >> >> > We will have a single global configuration flag that can be used to >> >> >turn >> >> >off the entire vectorization code path and we will specifically test >> to >> >> >make sure that when this flag is off there is no regression on the >> >>current >> >> >system. When vectorization is turned on, we will have a validation >> >>step to >> >> >make sure the given query is supported on the vectorization path >> >>otherwise >> >> >it will fall back to current code path. >> >> > >> >> > Although, we intend to follow commit then review policy on the >> branch >> >> >for >> >> >speed of development, each patch will have an associated jira and will >> >>be >> >> >available for review and feedback. >> >> > >> >> >thanks >> >> >jitendra >> >> > >> >> >On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <nj...@fb.com> wrote: >> >> > >> >> >> It will be difficult to merge back the branch. >> >> >> Can you stage your changes incrementally ? >> >> >> >> >> >> I mean, start with the making the operators vectorized - it can be a >> >>for >> >> >> loop to >> >> >> start with ? I think it will be very difficult to merge it back if >> we >> >> >> diverge on this. >> >> >> I would recommend starting with simple interfaces for operators and >> >>then >> >> >> plugging them >> >> >> in slowly instead of a new branch, unless this approach is extremely >> >> >> difficult. >> >> >> >> >> >> >> >> >> Thanks, >> >> >> -namit >> >> >> >> >> >> On 4/3/13 1:52 AM, "Jitendra Pandey" <jiten...@hortonworks.com> >> >>wrote: >> >> >> >> >> >> >Hi Folks, >> >> >> > I want to propose for creation of a separate branch for >> >>HIVE-4160 >> >> >> >work. This is a significant amount of work, and support for very >> >>basic >> >> >> >functionality will need big chunks of code. It will also take some >> >> >>time to >> >> >> >stabilize and test. A separate dev branch will allow us to do this >> >>work >> >> >> >incrementally and collaboratively. We have already uploaded a >> design >> >> >> >document on the jira for comments/feedback. >> >> >> > >> >> >> >thanks >> >> >> >jitendra >> >> >> > >> >> >> > >> >> >> >-- >> >> >> ><http://hortonworks.com/download/> >> >> >> >> >> >> >> >> > >> >> > >> >> >-- >> >> ><http://hortonworks.com/download/> >> >> >> >> >> > >> > >> >-- >> ><http://hortonworks.com/download/> >> >> >