Re: Spark performance tests

2017-01-10 Thread Kazuaki Ishizaki
Hi, You may find several micro-benchmarks under https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark . Regards, Kazuaki Ishizaki From: Prasun Ratn To: Apache Spark Dev Date: 2017/01/10 12:52 Subject:Spark performance tes

Re: Spark performance tests

2017-01-10 Thread Adam Roberts
Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks within it that exercise several components of Spark (great for stressing core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS queries (stressing the DataFrame API and therefore the Catalyst optimiser), both work

[SQL][CodeGen] Is there a way to set break point and debug the generated code?

2017-01-10 Thread dragonly
I am recently hacking into the SparkSQL and trying to add some new udts and functions, as well as some new Expression classes. I run into the problem of the return type of nullSafeEval method. In one of the new Expression classes, I want to return an array of my udt, and my code is like `return new

Re: Spark performance tests

2017-01-10 Thread Prasun Ratn
Thanks Adam, Kazuaki! On Tue, Jan 10, 2017 at 3:28 PM, Adam Roberts wrote: > Hi, I suggest HiBench and SparkSqlPerf, HiBench features many benchmarks > within it that exercise several components of Spark (great for stressing > core, sql, MLlib capabilities), SparkSqlPerf features 99 TPC-DS querie

Re: [SQL][CodeGen] Is there a way to set break point and debug the generated code?

2017-01-10 Thread Reynold Xin
It's unfortunately difficult to debug -- that's one downside of codegen. You can dump all the code via "explain codegen" though. That's typically enough for me to debug. On Tue, Jan 10, 2017 at 3:21 AM, dragonly wrote: > I am recently hacking into the SparkSQL and trying to add some new udts an

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-10 Thread Andy Dang
Thanks. It appears that TypedImperativeAggregate won't be available till 2.2.x. I'm stuck with my RDD approach then :( --- Regards, Andy On Tue, Jan 10, 2017 at 2:01 AM, Liang-Chi Hsieh wrote: > > Hi Andy, > > Because hash-based aggregate uses unsafe row as aggregation states, so the > aggr

Re: Tests failing with GC limit exceeded

2017-01-10 Thread shane knapp
quick update: things are looking slightly... better. the number of failing builds due to GC overhead has decreased slightly since the reboots last week... in fact, in the last three days the only builds to be affected are spark-master-test-maven-hadoop-2.7 (three failures) and spark-master-test

Re: [SQL][PYTHON] UDF improvements.

2017-01-10 Thread Maciej Szymkiewicz
Thanks for your response Ryan. Here you are https://issues.apache.org/jira/browse/SPARK-19159 On 01/09/2017 07:30 PM, Ryan Blue wrote: > Maciej, this looks great. > > Could you open a JIRA issue for improving the @udf decorator and > possibly sub-tasks for the specific features from the gist? Tha