Hi
I am currently implementing an algorithm involving matrix multiplication.
Basically, I have matrices represented as RDD[Array[Double]]. For example,
If I have A:RDD[Array[Double]] and B:RDD[Array[Double]] and what would be
the most efficient way to get C = A * B
Both A and B are large, so it w
Qing Yang, Andy is correct in answering your direct question.
At the same time, depending on your context, you may be able to apply a
pattern where you turn the single Spark application into a service, and
multiple clients if that service can indeed share access to the same RDDs.
Several groups h
RDDs cannot currently be shared across multiple SparkContexts without using
something like the Tachyon project (which is a separate project/codebase).
Andy
On May 16, 2014 2:14 PM, "qingyang li" wrote:
>
>
Yup, this is a good point, the interface includes stuff like launch scripts and
environment variables. However I do think that the current features of
spark-submit can all be supported in future releases. We’ll definitely have a
very strict standard for modifying these later on.
Matei
On May 1
I don't understand. We never said that interfaces wouldn't change from 0.9
to 1.0. What we are committing to is stability going forward from the
1.0.0 baseline. Nobody is disputing that backward-incompatible behavior or
interface changes would be an issue post-1.0.0. The question is whether
the
While developers may appreciate "1.0 == API stability," I'm not sure that will
be the understanding of the VP who gives the green light to a Spark-based
development effort.
I fear a bug that silently produces erroneous results will be perceived like
the FDIV bug, but in this case without the mo
I would make the case for interface stability not just api stability.
Particularly given that we have significantly changed some of our
interfaces, I want to ensure developers/users are not seeing red flags.
Bugs and code stability can be addressed in minor releases if found, but
behavioral change
We do actually have replicated StorageLevels in Spark. You can use
MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
replication factor.
BTW you guys should probably have this discussion on the JIRA rather than the
dev list; I think the replies somehow ended up on the de
BTW for what it’s worth I agree this is a good option to add, the only tricky
thing will be making sure the checkpoint blocks are not garbage-collected by
the block store. I don’t think they will be though.
Matei
On May 17, 2014, at 2:20 PM, Matei Zaharia wrote:
> We do actually have replicate
On 18-May-2014 1:45 am, "Mark Hamstra" wrote:
>
> I'm not trying to muzzle the discussion. All I am saying is that we don't
> need to have the same discussion about 0.10 vs. 1.0 that we already had.
Agreed, no point in repeating the same discussion ... I am also trying to
understand what the con
As others have said, the 1.0 milestone is about API stability, not about saying
“we’ve eliminated all bugs”. The sooner you declare 1.0, the sooner users can
confidently build on Spark, knowing that the application they build today will
still run on Spark 1.9.9 three years from now. This is some
I'm not trying to muzzle the discussion. All I am saying is that we don't
need to have the same discussion about 0.10 vs. 1.0 that we already had.
If you can tell me about specific changes in the current release candidate
that occasion new arguments for why a 1.0 release is an unacceptable idea,
+1
Reran my tests from rc5:
* Built the release from source.
* Compiled Java and Scala apps that interact with HDFS against it.
* Ran them in local mode.
* Ran them against a pseudo-distributed YARN cluster in both yarn-client
mode and yarn-cluster mode.
On Sat, May 17, 2014 at 10:08 AM, Andrew
On 17-May-2014 11:40 pm, "Mark Hamstra" wrote:
>
> That is a past issue that we don't need to be re-opening now. The present
Huh ? If we need to revisit based on changed circumstances, we must - the
scope of changes introduced in this release was definitely not anticipated
when 1.0 vs 0.10 discu
+1 on the running commentary here, non-binding of course :-)
On Sat, May 17, 2014 at 8:44 AM, Andrew Ash wrote:
> +1 on the next release feeling more like a 0.10 than a 1.0
> On May 17, 2014 4:38 AM, "Mridul Muralidharan" wrote:
>
> > I had echoed similar sentiments a while back when there was
That is a past issue that we don't need to be re-opening now. The present
issue, and what I am asking, is which pending bug fixes does anyone
anticipate will require breaking the public API guaranteed in rc9?
On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan wrote:
> We made incompatible api
+1
2014-05-17 8:53 GMT-07:00 Mark Hamstra :
> +1
>
>
> On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell >wrote:
>
> > I'll start the voting with a +1.
> >
> > On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell
> > wrote:
> > > Please vote on releasing the following candidate as Apache Spark
>
We made incompatible api changes whose impact we don't know yet completely
: both from implementation and usage point of view.
We had the option of getting real-world feedback from the user community if
we had gone to 0.10 but the spark developers seemed to be in a hurry to get
to 1.0 - so I made
On Sat, May 17, 2014 at 4:52 PM, Mark Hamstra wrote:
> Which of the unresolved bugs in spark-core do you think will require an
> API-breaking change to fix? If there are none of those, then we are still
> essentially on track for a 1.0.0 release.
I don't have a particular one in mind, but look a
+1 on the next release feeling more like a 0.10 than a 1.0
On May 17, 2014 4:38 AM, "Mridul Muralidharan" wrote:
> I had echoed similar sentiments a while back when there was a discussion
> around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
> changes, add missing functionalit
+1
On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell wrote:
> I'll start the voting with a +1.
>
> On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell
> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> 1.0.0!
> > This has one bug fix and one minor feature on t
Which of the unresolved bugs in spark-core do you think will require an
API-breaking change to fix? If there are none of those, then we are still
essentially on track for a 1.0.0 release.
The number of contributions and pace of change now is quite high, but I
don't think that waiting for the pace
I suspect this is an issue we have fixed internally here as part of a
larger change - the issue we fixed was not a config issue but bugs in spark.
Unfortunately we plan to contribute this as part of 1.1
Regards,
Mridul
On 17-May-2014 4:09 pm, "sam (JIRA)" wrote:
> sam created SPARK-1867:
>
I had echoed similar sentiments a while back when there was a discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
changes, add missing functionality, go through a hardening release before
1.0
But the community preferred a 1.0 :-)
Regards,
Mridul
On 17-May-2014 3:19
On this note, non-binding commentary:
Releases happen in local minima of change, usually created by
internally enforced code freeze. Spark is incredibly busy now due to
external factors -- recently a TLP, recently discovered by a large new
audience, ease of contribution enabled by Github. It's get
We don't have 3x replication in spark :-)
And if we use replicated storagelevel, while decreasing odds of failure, it
does not eliminate it (since we are not doing a great job with replication
anyway from fault tolerance point of view).
Also it does take a nontrivial performance hit with replicated
Can you try moving your mapPartitions to another class/object which is
referenced only after sc.addJar ?
I would suspect CNFEx is coming while loading the class containing
mapPartitions before addJars is executed.
In general though, dynamic loading of classes means you use reflection to
instantia
Please vote on releasing the following candidate as Apache Spark version 1.0.0!
This has one bug fix and one minor feature on top of rc8:
SPARK-1864: https://github.com/apache/spark/pull/808
SPARK-1808: https://github.com/apache/spark/pull/799
The tag to be voted on is v1.0.0-rc9 (commit 920f947):
I'll start the voting with a +1.
On Sat, May 17, 2014 at 12:58 AM, Patrick Wendell wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.0.0!
> This has one bug fix and one minor feature on top of rc8:
> SPARK-1864: https://github.com/apache/spark/pull/808
> SPARK
Cancelled in favor of rc9.
On Sat, May 17, 2014 at 12:51 AM, Patrick Wendell wrote:
> Due to the issue discovered by Michael, this vote is cancelled in favor of
> rc9.
>
> On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust
> wrote:
>> -1
>>
>> We found a regression in the way configuration is pa
Due to the issue discovered by Michael, this vote is cancelled in favor of rc9.
On Fri, May 16, 2014 at 6:22 PM, Michael Armbrust
wrote:
> -1
>
> We found a regression in the way configuration is passed to executors.
>
> https://issues.apache.org/jira/browse/SPARK-1864
> https://github.com/apache
31 matches
Mail list logo