To: Mendelson, Assaf
Subject: Re: statistics collection and propagation for cost-based optimizer
They are not yet complete. The benchmark was done with an implementation of
cost-based optimizer Huawei had internally for Spark 1.5 (or some even older
version).
On Mon, Nov 14, 2016 at 10:46 PM
They are not yet complete. The benchmark was done with an implementation of
cost-based optimizer Huawei had internally for Spark 1.5 (or some even
older version).
On Mon, Nov 14, 2016 at 10:46 PM, Yogesh Mahajan
wrote:
> It looks like Huawei team have run TPC-H benchmark and some real-world
> te
It looks like Huawei team have run TPC-H benchmark and some real-world test
cases and their results show good performance gain in 2X-5X speedup
depending on data volume.
Can we share the numbers and query wise rational behind the gain? Are there
anything done on spark master yet? Or the implementat
Thanks Reynold for the detailed proposals. A few questions/clarifications -
1) How the existing rule based operator co-exist with CBO? The existing
rules are heuristics/empirical based, i am assuming rules like predicate
pushdown or project pruning will co-exist with CBO and we just want to
accura
Historically tpcds and tpch. There is certainly a chance of overfitting one
or two benchmarks. Note that those will probably be impacted more by the
way we set the parameters for CBO rather than using x or y for summary
statistics.
On Monday, November 14, 2016, Shivaram Venkataraman <
shiva...@eec
Do we have any query workloads for which we can benchmark these
proposals in terms of performance ?
Thanks
Shivaram
On Sun, Nov 13, 2016 at 5:53 PM, Reynold Xin wrote:
> One additional note: in terms of size, the size of a count-min sketch with
> eps = 0.1% and confidence 0.87, uncompressed, is
One additional note: in terms of size, the size of a count-min sketch with
eps = 0.1% and confidence 0.87, uncompressed, is 48k bytes.
To look up what that means, see
http://spark.apache.org/docs/latest/api/java/org/apache/spark/util/sketch/CountMinSketch.html
On Sun, Nov 13, 2016 at 5:30 PM,