date:20140517

Matrix Multiplication of two RDD[Array[Double]]'s

2014-05-17 Thread Liquan Pei

Hi I am currently implementing an algorithm involving matrix multiplication. Basically, I have matrices represented as RDD[Array[Double]]. For example, If I have A:RDD[Array[Double]] and B:RDD[Array[Double]] and what would be the most efficient way to get C = A * B Both A and B are large, so it w

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Christopher Nguyen

Qing Yang, Andy is correct in answering your direct question. At the same time, depending on your context, you may be able to apply a pattern where you turn the single Spark application into a service, and multiple clients if that service can indeed share access to the same RDDs. Several groups h

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Andy Konwinski

RDDs cannot currently be shared across multiple SparkContexts without using something like the Tachyon project (which is a separate project/codebase). Andy On May 16, 2014 2:14 PM, "qingyang li" wrote: > >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia

Yup, this is a good point, the interface includes stuff like launch scripts and environment variables. However I do think that the current features of spark-submit can all be supported in future releases. We’ll definitely have a very strict standard for modifying these later on. Matei On May 1

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra

I don't understand. We never said that interfaces wouldn't change from 0.9 to 1.0. What we are committing to is stability going forward from the 1.0.0 baseline. Nobody is disputing that backward-incompatible behavior or interface changes would be an issue post-1.0.0. The question is whether the

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Michael Malak

While developers may appreciate "1.0 == API stability," I'm not sure that will be the understanding of the VP who gives the green light to a Spark-based development effort. I fear a bug that silently produces erroneous results will be perceived like the FDIV bug, but in this case without the mo

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan

I would make the case for interface stability not just api stability. Particularly given that we have significantly changed some of our interfaces, I want to ensure developers/users are not seeing red flags. Bugs and code stability can be addressed in minor releases if found, but behavioral change

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia

We do actually have replicated StorageLevels in Spark. You can use MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom replication factor. BTW you guys should probably have this discussion on the JIRA rather than the dev list; I think the replies somehow ended up on the de

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Matei Zaharia

BTW for what it’s worth I agree this is a good option to add, the only tricky thing will be making sure the checkpoint blocks are not garbage-collected by the block store. I don’t think they will be though. Matei On May 17, 2014, at 2:20 PM, Matei Zaharia wrote: > We do actually have replicate

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan

On 18-May-2014 1:45 am, "Mark Hamstra" wrote: > > I'm not trying to muzzle the discussion. All I am saying is that we don't > need to have the same discussion about 0.10 vs. 1.0 that we already had. Agreed, no point in repeating the same discussion ... I am also trying to understand what the con

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Matei Zaharia

As others have said, the 1.0 milestone is about API stability, not about saying “we’ve eliminated all bugs”. The sooner you declare 1.0, the sooner users can confidently build on Spark, knowing that the application they build today will still run on Spark 1.9.9 three years from now. This is some

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra

I'm not trying to muzzle the discussion. All I am saying is that we don't need to have the same discussion about 0.10 vs. 1.0 that we already had. If you can tell me about specific changes in the current release candidate that occasion new arguments for why a 1.0 release is an unacceptable idea,

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Sandy Ryza

+1 Reran my tests from rc5: * Built the release from source. * Compiled Java and Scala apps that interact with HDFS against it. * Ran them in local mode. * Ran them against a pseudo-distributed YARN cluster in both yarn-client mode and yarn-cluster mode. On Sat, May 17, 2014 at 10:08 AM, Andrew

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan

On 17-May-2014 11:40 pm, "Mark Hamstra" wrote: > > That is a past issue that we don't need to be re-opening now. The present Huh ? If we need to revisit based on changed circumstances, we must - the scope of changes introduced in this release was definitely not anticipated when 1.0 vs 0.10 discu

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Kan Zhang

+1 on the running commentary here, non-binding of course :-) On Sat, May 17, 2014 at 8:44 AM, Andrew Ash wrote: > +1 on the next release feeling more like a 0.10 than a 1.0 > On May 17, 2014 4:38 AM, "Mridul Muralidharan" wrote: > > > I had echoed similar sentiments a while back when there was

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra

That is a past issue that we don't need to be re-opening now. The present issue, and what I am asking, is which pending bug fixes does anyone anticipate will require breaking the public API guaranteed in rc9? On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan wrote: > We made incompatible api

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Andrew Or

+1 2014-05-17 8:53 GMT-07:00 Mark Hamstra : > +1 > > > On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell >wrote: > > > I'll start the voting with a +1. > > > > On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell > > wrote: > > > Please vote on releasing the following candidate as Apache Spark >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan

We made incompatible api changes whose impact we don't know yet completely : both from implementation and usage point of view. We had the option of getting real-world feedback from the user community if we had gone to 0.10 but the spark developers seemed to be in a hurry to get to 1.0 - so I made

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen

On Sat, May 17, 2014 at 4:52 PM, Mark Hamstra wrote: > Which of the unresolved bugs in spark-core do you think will require an > API-breaking change to fix? If there are none of those, then we are still > essentially on track for a 1.0.0 release. I don't have a particular one in mind, but look a

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Andrew Ash

+1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, "Mridul Muralidharan" wrote: > I had echoed similar sentiments a while back when there was a discussion > around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api > changes, add missing functionalit

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Mark Hamstra

+1 On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell wrote: > I'll start the voting with a +1. > > On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell > wrote: > > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > > This has one bug fix and one minor feature on t

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mark Hamstra

Which of the unresolved bugs in spark-core do you think will require an API-breaking change to fix? If there are none of those, then we are still essentially on track for a 1.0.0 release. The number of contributions and pace of change now is quite high, but I don't think that waiting for the pace

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-05-17 Thread Mridul Muralidharan

I suspect this is an issue we have fixed internally here as part of a larger change - the issue we fixed was not a config issue but bugs in spark. Unfortunately we plan to contribute this as part of 1.1 Regards, Mridul On 17-May-2014 4:09 pm, "sam (JIRA)" wrote: > sam created SPARK-1867: >

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan

I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before 1.0 But the community preferred a 1.0 :-) Regards, Mridul On 17-May-2014 3:19

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Sean Owen

On this note, non-binding commentary: Releases happen in local minima of change, usually created by internally enforced code freeze. Spark is incredibly busy now due to external factors -- recently a TLP, recently discovered by a large new audience, ease of contribution enabled by Github. It's get

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-17 Thread Mridul Muralidharan

We don't have 3x replication in spark :-) And if we use replicated storagelevel, while decreasing odds of failure, it does not eliminate it (since we are not doing a great job with replication anyway from fault tolerance point of view). Also it does take a nontrivial performance hit with replicated

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-17 Thread Mridul Muralidharan

Can you try moving your mapPartitions to another class/object which is referenced only after sc.addJar ? I would suspect CNFEx is coming while loading the class containing mapPartitions before addJars is executed. In general though, dynamic loading of classes means you use reflection to instantia

[VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell

Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has one bug fix and one minor feature on top of rc8: SPARK-1864: https://github.com/apache/spark/pull/808 SPARK-1808: https://github.com/apache/spark/pull/799 The tag to be voted on is v1.0.0-rc9 (commit 920f947):

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-17 Thread Patrick Wendell

I'll start the voting with a +1. On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.0.0! > This has one bug fix and one minor feature on top of rc8: > SPARK-1864: https://github.com/apache/spark/pull/808 > SPARK

[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell

Cancelled in favor of rc9. On Sat, May 17, 2014 at 12:51 AM, Patrick Wendell wrote: > Due to the issue discovered by Michael, this vote is cancelled in favor of > rc9. > > On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust > wrote: >> -1 >> >> We found a regression in the way configuration is pa

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

2014-05-17 Thread Patrick Wendell

Due to the issue discovered by Michael, this vote is cancelled in favor of rc9. On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust wrote: > -1 > > We found a regression in the way configuration is passed to executors. > > https://issues.apache.org/jira/browse/SPARK-1864 > https://github.com/apache

Matrix Multiplication of two RDD[Array[Double]]'s

Re: can RDD be shared across mutil spark applications?

Re: can RDD be shared across mutil spark applications?

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Re: Calling external classes added by sc.addJar needs to be through reflection

[VOTE] Release Apache Spark 1.0.0 (rc9)

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

[RESULT] [VOTE] Release Apache Spark 1.0.0 (rc8)

Re: [VOTE] Release Apache Spark 1.0.0 (rc8)

31 matches

Site Navigation

Mail list logo

Footer information