Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
Alright, sounds good! I've created databricks/spark-perf/issues/9 as a reminder for us to add a new test once we've root caused SPARK-. On Tue, Sep 2, 2014 at 1:07 AM, Patrick Wendell wrote: > Yeah, this wasn't detected in our performance

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Patrick Wendell
Yeah, this wasn't detected in our performance tests. We even have a test in PySpark that I would have though might catch this (it just schedules a bunch of really small tasks, similar to the regression case). https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51 Anyways,

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Matei Zaharia
Nope, actually, they didn't find that (they found some other things that were fixed, as well as some improvements). Feel free to send a PR, but it would be good to profile the issue first to understand what slowed down. (For example is the map phase taking longer or is it the reduce phase, is th

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
Oh, that's sweet. So, a related question then. Did those tests pick up the performance issue reported in SPARK- ? Does it make sense to add a new test to cover that case? On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia wrote: > Hi Nicholas,

Re: Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Matei Zaharia
Hi Nicholas, At Databricks we already run https://github.com/databricks/spark-perf for each release, which is a more comprehensive performance test suite. Matei On September 1, 2014 at 8:22:05 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: What do people think of running the Big Dat

Run the "Big Data Benchmark" for new releases

2014-09-01 Thread Nicholas Chammas
What do people think of running the Big Data Benchmark (repo ) as part of preparing every new release of Spark? We'd run it just for Spark and effectively use it as another type of test to track any performance progre

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-01 Thread Andrew Or
+1. Tested all the basic applications under both deploy modes (where applicable) in the following environments: - locally on OSX 10.9 - locally on Windows 8.1 - standalone cluster - yarn cluster built with Hadoop 2.4 >From this front I have observed no regressions, and verified that standalone-cl

Re: Jira tickets for starter tasks

2014-09-01 Thread Josh Rosen
A number of folks have emailed me to add them, but I’ve been unable to find their usernanmes in the Apache JIRA.  Note that you need to have an account at issues.apache.org, which may or may not have the same email / username as your accounts on any other Apache systems, including CWiki.  Even i

RE: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-09-01 Thread chutium
thanks a lot, Hao, finally solved this problem, changes of CSVSerDe are here: https://github.com/chutium/csv-serde/commit/22c667c003e705613c202355a8791978d790591e btw, "add jar" in spark hive or hive-thriftserver always doesn't work, we build the spark with libraryDependencies += "csv-serde" ...

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-01 Thread Prashant Sharma
Easy or quicker way to build spark is sbt/sbt assembly/assembly Prashant Sharma On Mon, Sep 1, 2014 at 8:40 PM, Nicholas Chammas wrote: > If this is not a confirmed regression from 1.0.2, I think it's better to > report it in a separate thread or JIRA. > > I believe serious regressions are

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-01 Thread Nicholas Chammas
If this is not a confirmed regression from 1.0.2, I think it's better to report it in a separate thread or JIRA. I believe serious regressions are generally the only reason to block a new release. Otherwise, if this is an old issue, it should be handled separately. 2014년 9월 1일 월요일, chutium님이 작성한

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-01 Thread chutium
i didn't tried with 1.0.2 it takes always too long to build spark assembly jars... more than 20min [info] Packaging /mnt/some-nfs/common/spark/assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop1.0.3-mapr-3.0.3.jar ... [info] Packaging /mnt/some-nfs/common/spark/examples/target/scala-