Re: Spark 0.9.1 release

2014-03-27 Thread Tathagata Das
I have cut another release candidate, RC3, with two important bug fixes. See the following JIRAs for more details. 1. Bug with intercepts in MLLib's GLM: https://spark-project.atlassian.net/browse/SPARK-1327 2. Bug in PySpark's RDD.top() ordering: https://spark-project.atlassian.net/browse/SPARK-13

Re: Spark 0.9.1 release

2014-03-26 Thread Tathagata Das
Updates: 1. Fix for the ASM problemthat Kevin mentioned is already in Spark 0.9.1 RC2 2. Fix for pyspark's RDD.top() that Patrick mentioned has been pulled into branch 0.9. This will get int

Re: Spark 0.9.1 release

2014-03-26 Thread Patrick Wendell
Hey TD, This one we just merged into master this morning: https://spark-project.atlassian.net/browse/SPARK-1322 It should definitely go into the 0.9 branch because there was a bug in the semantics of top() which at this point is unreleased in Python. I didn't backport it yet because I figured yo

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 10:53 AM, Tathagata Das wrote: > PR 159 seems like a fairly big patch to me. And quite recent, so its impact > on the scheduling is not clear. It may also depend on other changes that > may have gotten into the DAGScheduler but not pulled into branch 0.9. I am > not sure it

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
On Wed, Mar 26, 2014 at 11:04 AM, Kay Ousterhout wrote: > I don't think the blacklisting is a priority and the CPUS_PER_TASK issue > was still broken after this patch (so broken that I'm convinced no one > actually uses this feature!!), so agree with TD's sentiment that this > shouldn't go into 0.

Re: Spark 0.9.1 release

2014-03-25 Thread Kay Ousterhout
I don't think the blacklisting is a priority and the CPUS_PER_TASK issue was still broken after this patch (so broken that I'm convinced no one actually uses this feature!!), so agree with TD's sentiment that this shouldn't go into 0.9.1. On Tue, Mar 25, 2014 at 10:23 PM, Tathagata Das wrote: >

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
PR 159 seems like a fairly big patch to me. And quite recent, so its impact on the scheduling is not clear. It may also depend on other changes that may have gotten into the DAGScheduler but not pulled into branch 0.9. I am not sure it is a good idea to pull that in. We can pull those changes later

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
Forgot to mention this in the earlier request for PR's. If there is another RC being cut, please add https://github.com/apache/spark/pull/159 to it too (if not done already !). Thanks, Mridul On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das wrote: > Hello everyone, > > Since the release of Spark

Re: Spark 0.9.1 release

2014-03-25 Thread Tathagata Das
@evan >From the discussion in the JIRA, it seems that we still dont have a clear solution for SPARK-1138. Nor do we have a sense of whether the solution is going to small enough for a maintenance release. So I dont think we should block the release of Spark 0.9.1 for this. We can make another Spark

Re: Spark 0.9.1 release

2014-03-25 Thread Kevin Markey
TD: A correct shading of ASM should only affect Spark code unless someone is relying on ASM 4.0 in unrelated project code, in which case they can add org.ow2.asm:asm:4.x as a dependency. Our short term solution has been to repackage other libraries with a 3.2 dependency or to exclude ASM whe

Re: Spark 0.9.1 release

2014-03-25 Thread Evan Chan
Hey guys, I think SPARK-1138 should be resolved before releasing Spark 0.9.1. It's affecting multiple users ability to use Spark 0.9 with various versions of Hadoop. I have one fix but not sure if it works for others. -Evan On Mon, Mar 24, 2014 at 5:30 PM, Tathagata Das wrote: > Hello Kevin, >

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
Hello Kevin, A fix for SPARK-782 would definitely simplify building against Spark. However, its possible that a fix for this issue in 0.9.1 will break the builds (that reference spark) of existing 0.9 users, either due to a change in the ASM version, or for being incompatible with their current wo

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey
Is there any way that [SPARK-782] (Shade ASM) can be included? I see that it is not currently backported to 0.9. But there is no single issue that has caused us more grief as we integrate spark-core with other project dependencies. There are way too many libraries out there in addition to Sp

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
1051 has been pulled in! search 1051 in https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-0.9 TD On Mon, Mar 24, 2014 at 4:26 PM, Kevin Markey wrote: > 1051 is essential! > I'm not sure about the others, but anything that adds stability to > Spark/Yarn would be

Re: Spark 0.9.1 release

2014-03-24 Thread Kevin Markey
1051 is essential! I'm not sure about the others, but anything that adds stability to Spark/Yarn would be helpful. Kevin Markey On 03/20/2014 01:12 PM, Tom Graves wrote: I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and [SPARK-1051] On Yarn, execu

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
Patrick, yes, that is indeed a risk. On Mon, Mar 24, 2014 at 12:30 AM, Tathagata Das wrote: > Patrick, that is a good point. > > > On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell wrote: > >> > Spark's dependency graph in a maintenance >> *Modifying* Spark's dependency graph... >> -- -- Evan

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
Patrick, that is a good point. On Mon, Mar 24, 2014 at 12:14 AM, Patrick Wendell wrote: > > Spark's dependency graph in a maintenance > *Modifying* Spark's dependency graph... >

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
> Spark's dependency graph in a maintenance *Modifying* Spark's dependency graph...

Re: Spark 0.9.1 release

2014-03-24 Thread Patrick Wendell
Hey Evan and TD, Spark's dependency graph in a maintenance release seems potentially harmful, especially upgrading a minor version (not just a patch version) like this. This could affect other downstream users. For instance, now without knowing their fastutil dependency gets bumped and they hit so

Re: Spark 0.9.1 release

2014-03-24 Thread Evan Chan
@Tathagata, the PR is here: https://github.com/apache/spark/pull/215 On Mon, Mar 24, 2014 at 12:02 AM, Tathagata Das wrote: > @Shivaram, That is a useful patch but I am bit afraid merge it in. > Randomizing the executor has performance implications, especially for Spark > Streaming. The non-rand

Re: Spark 0.9.1 release

2014-03-24 Thread Tathagata Das
@Shivaram, That is a useful patch but I am bit afraid merge it in. Randomizing the executor has performance implications, especially for Spark Streaming. The non-randomized ordering of allocating machines to tasks was subtly helping to speed up certain window-based shuffle operations. For example,

Re: Spark 0.9.1 release

2014-03-23 Thread Evan Chan
I also have a really minor fix for SPARK-1057 (upgrading fastutil), could that also make it in? -Evan On Sun, Mar 23, 2014 at 11:01 PM, Shivaram Venkataraman wrote: > Sorry this request is coming in a bit late, but would it be possible to > backport SPARK-979[1] to branch-0.9 ? This is the pat

Re: Spark 0.9.1 release

2014-03-23 Thread Shivaram Venkataraman
Sorry this request is coming in a bit late, but would it be possible to backport SPARK-979[1] to branch-0.9 ? This is the patch for randomizing executor offers and I would like to use this in a release sooner rather than later. Thanks Shivaram [1] https://github.com/apache/spark/commit/556c56689b

Re: Spark 0.9.1 release

2014-03-20 Thread Bhaskar Dutta
Thank You! We plan to test out 0.9.1 on YARN once it is out. Regards, Bhaskar On Fri, Mar 21, 2014 at 12:42 AM, Tom Graves wrote: > I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running > on YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as > submitting user - J

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell
Thanks Tom, After I looked more at this patch I don't see how this could have regressed behavior for any users (it seems like it only pertains to warnings and instructions). So maybe the user mistook this patch for a different issue. https://github.com/apache/incubator-spark/pull/553/files - Pat

Re: Spark 0.9.1 release

2014-03-20 Thread Tom Graves
Thanks for the heads up, saw that and will make sure that is resolved before pulling into 0.9.  Unless I'm missing something, they should just use sc.addJar to distributed the jar rather then relying on SPARK_YARN_APP_JAR. Tom On Thursday, March 20, 2014 3:31 PM, Patrick Wendell wrote: Hey

Re: Spark 0.9.1 release

2014-03-20 Thread Patrick Wendell
Hey Tom, > I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on > YARN - JIRA and [SPARK-1051] On Yarn, executors don't doAs as submitting > user - JIRA in. The pyspark one I would consider more of an enhancement so > might not be appropriate for a point release. Some

Re: Spark 0.9.1 release

2014-03-20 Thread Tom Graves
I'll pull [SPARK-1053] Should not require SPARK_YARN_APP_JAR when running on YARN - JIRA and  [SPARK-1051] On Yarn, executors don't doAs as submitting user - JIRA in.  The pyspark one I would consider more of an enhancement so might not be appropriate for a point release.  [SPARK-1053] Shoul

Re: Spark 0.9.1 release

2014-03-20 Thread Bhaskar Dutta
It will be great if "SPARK-1101: Umbrella for hardening Spark on YARN" can get into 0.9.1. Thanks, Bhaskar On Thu, Mar 20, 2014 at 5:37 AM, Tathagata Das wrote: > Hello everyone, > > Since the release of Spark 0.9, we have received a number

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
If 1.0 is just round the corner, then it is fair enough to push to that, thanks for clarifying ! Regards, Mridul On Wed, Mar 19, 2014 at 6:12 PM, Tathagata Das wrote: > I agree that the garbage collection > PRwould make things very > convenient in a lot

Re: Spark 0.9.1 release

2014-03-19 Thread Tathagata Das
I agree that the garbage collection PRwould make things very convenient in a lot of usecases. However, there are two broads reasons why it is hard for that PR to get into 0.9.1. 1. The PR still needs some amount of work and quite a lot of testing. While we

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
Would be great if the garbage collection PR is also committed - if not the whole thing, atleast the part to unpersist broadcast variables explicitly would be great. Currently we are running with a custom impl which does something similar, and I would like to move to standard distribution for that.

Spark 0.9.1 release

2014-03-19 Thread Tathagata Das
Hello everyone, Since the release of Spark 0.9, we have received a number of important bug fixes and we would like to make a bug-fix release of Spark 0.9.1. We are going to cut a release candidate soon and we would love it if people test it out. We have backported several bug fixes into the 0.9 a