[
https://issues.apache.org/jira/browse/HIVE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989084#comment-12989084
]
bharath v commented on HIVE-1938:
---------------------------------
I have a few basic ideas about join costs. Consider the case of Common Join(We
can extend it to other joins) for which I propose a mathematical approach.
We can consider disk IO and Communication cost as the major bottlenecks and
optimize the join based on these factors.The challenge here is in predicting
the size of the intermediate tables output ( while joining more than 2 tables).
Suppose the there are "N" machines in the Hive-Hadoop cluster.
Suppose we are joining 2 table "A" on column "a" with table "B" ob column
"b".Table A contains n1 rows and table B has n2 rows. Suppose A.a has "d1"
distinct values and B.b has "d2" distinct values and suppose d1<d2 without loss
of generality. The cost computation of a single join can be divided into 3
phases
1) Map phase :
Both the tables are scanned from the disks once and we can assume that this
is done parallely on N machines. So total disk IO cost is estimated as
O((n1+n2)/N) (Assuming constant page size)
2) Shuffle Phase:
The cost estimation during this phase depends on the type of join. In the
worst case every row gets transferred to a reducer on different machine (other
than the one on which it resides). This results in movement of every row to a
different machine . Assuming some constant ping-time (that we get after making
some avg calculations) the cost is O(k*(n1+n2)) where this k is
"cost/unit-transfer" of data. Actual shuffle costs may be less than this due to
"maximizing locality" effect in hadoop MR. Iam just providing an upper bound
and we can surely improve upon this depending on the type of join.
For eg: Consider map-side join we can add the cost of moving the smaller table
to all the mappers = O(k*N*N3) where N3 is no. of rows in the smaller table.
3) Reduce phase :
In each reduce we just do a simple nested-loop join. Assuming "AVERAGE
DISTRIBUTION" of data, we get (n1/d1) rows of "A" and (n2/d2) rows of "B" per a
single call of reducer and cost for this is O(n1/d1*n2/d2) and there can be
at-max d1/N sequential reducers running . So multiplying that factor, the total
cost is O((n1*n2)/(N*d2)).
The above cost is as a result of assuming uniform distribution of data per
table. We can improve this by maintaining histograms of statistics per node by
which we can improve upon this calculation.
After calculation of join cost we need to predict the size of resultant join
table so that it's statistics can be used in continuing the above procedure for
the rest of the tables.
Estimating the result table size:
The resultant table will have O(n1*n2/d1) rows assuming that the distribution
of A.a and B.b 's distinct values is same. This is where we can improve a great
deal by maintaining good statistics so that we can predict correct size of the
resultant table.
We can follow a simple system-R approach of dynamic programming where we choose
the best-ones in each iteration using the above cost formulae.
This is a pretty long post already .. This is the basic approach and we can
improve upon this further depending on the feedback I get.
I presented this work as a poster at ACM-SIGMOD 2010 and I got some positive
feedback.
> Cost Based Query optimization in Hive
> -------------------------------------
>
> Key: HIVE-1938
> URL: https://issues.apache.org/jira/browse/HIVE-1938
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Environment: *nix,java
> Reporter: bharath v
>
> Current optimization in Hive is just rule-based and involves applying a set
> of rules on the Plan tree. This depends on hints given by the user (which may
> or may-not be correct) and might result in execution of costlier plans.So
> this jira aims at building a cost-model which can give a good estimate
> various plans before hand (using some meta-data already collected) and we can
> choose the best plan which incurs the least cost.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira