I only used client mode both 1.3 and 1.4 versions on mesos.
I skimmed through
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala.
I would actually backport the Cluster Mode feature. Sorry, I don't have an
answer for this.
On
Yes.
Sent from my iPhone
> On 19 Jul, 2015, at 10:52 pm, "Jahagirdar, Madhu"
> wrote:
>
> All,
>
> Can we run different version of Spark using the same Mesos Dispatcher. For
> example we can run drivers with Spark 1.3 and Spark 1.4 at the same time ?
>
> Regards,
> Madhu Jahagirdar
>
> Th
The PR to fix this is out.
https://github.com/apache/spark/pull/7519
On Sun, Jul 19, 2015 at 6:41 PM, Tathagata Das wrote:
> I am taking care of this right now.
>
> On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell
> wrote:
>
>> I think we should just revert this patch on all affected branches.
I am taking care of this right now.
On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell wrote:
> I think we should just revert this patch on all affected branches. No
> reason to leave the builds broken until a fix is in place.
>
> - Patrick
>
> On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen wrote:
>
I think we should just revert this patch on all affected branches. No
reason to leave the builds broken until a fix is in place.
- Patrick
On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen wrote:
> Yep, I emailed TD about it; I think that we may need to make a change to the
> pull request builder to f
Yep, I emailed TD about it; I think that we may need to make a change to
the pull request builder to fix this. Pending that, we could just revert
the commit that added this.
On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu wrote:
> Hi,
> I noticed that KinesisStreamSuite fails for both hadoop profiles i
Hi,
I noticed that KinesisStreamSuite fails for both hadoop profiles in master
Jenkins builds.
From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
:
KinesisStreamSuite:*** RUN ABORTED *** java.lang.AssertionError:
asser
Hi Juan,
It's exactly what I meant. if we will have high load with many repetitions it
can significantly reduce rdd size and improve performance. in real use
cases application frequently need to enrich data from cache or external
system, so we will save time on each repetition.
I will also do some
Hi,
My two cents is that that could be interesting if all RDD and pair
RDD operations would be lifted to work on groupedRDD. For example as
suggested a map on grouped RDDs would be more efficient if the original RDD
had lots of duplicate entries, but for RDDs with little repetitions I guess
you in
In the Spark model, constructing an RDD does not mean storing all its
contents in memory. Rather, an RDD is a description of a dataset that
enables iterating over its contents, record by record (in parallel). The
only time the full contents of an RDD are stored in memory is when a user
explicitly
Sorry, maybe I am saying something completely wrong... we have a stream,
we digitize it to created rdd. rdd in this case will be just array of any.
than we apply transformation to create new grouped rdd and GC should remove
original rdd from memory(if we won't persist it). Will we have GC step in
The user gets to choose what they want to reside in memory. If they call
rdd.cache() on the original RDD, it will be in memory. If they call
rdd.cache() on the compact RDD, it will be in memory. If cache() is called
on both, they'll both be in memory.
-Sandy
On Sun, Jul 19, 2015 at 11:09 AM, С
Thanks for answer! Could you please answer for one more question? Will we
have in memory original rdd and grouped rdd in the same time?
2015-07-19 21:04 GMT+03:00 Sandy Ryza :
> Edit: the first line should read:
>
> val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _)
>
> On Sun, Jul 19, 2015 at
This functionality already basically exists in Spark. To create the
"grouped RDD", one can run:
val groupedRdd = rdd.reduceByKey(_ + _)
To get it back into the original form:
groupedRdd.flatMap(x => List.fill(x._1)(x._2))
-Sandy
-Sandy
On Sun, Jul 19, 2015 at 10:40 AM, Сергей Лихоман
wr
Edit: the first line should read:
val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _)
On Sun, Jul 19, 2015 at 11:02 AM, Sandy Ryza
wrote:
> This functionality already basically exists in Spark. To create the
> "grouped RDD", one can run:
>
> val groupedRdd = rdd.reduceByKey(_ + _)
>
> To g
Hi,
I am looking for suitable issue for Master Degree project(it sounds like
scalability problems and improvements for spark streaming) and seems like
introduction of grouped RDD(for example: don't store
"Spark", "Spark", "Spark", instead store ("Spark", 3)) can:
1. Reduce memory needed for RDD (
+1
On Sat, Jul 18, 2015 at 4:00 PM, Mridul Muralidharan
wrote:
> Thanks for detailing, definitely sounds better.
> +1
>
> Regards
> Mridul
>
> On Saturday, July 18, 2015, Reynold Xin wrote:
>
>> A single commit message consisting of:
>>
>> 1. Pull request title (which includes JIRA number and c
Sean B.,
Thank you for giving a thorough reply. I will work with Sean O. and
see what we can change to make us more in line with the stated policy.
I did some research and it appears that some time between October [1]
and December [2] 2006, this page was modified to include stricter
policy surrou
Hey Sean,
One other thing I'd be okay doing is moving the main text about
nightly builds to the wiki and just have header called "Nightly
builds" at the end of the downloads page that says "For developers,
Spark maintains nightly builds. More information is available on the
[Spark developer Wiki](
I am going to make an edit to the download page on the web site to
start, as that much seems uncontroversial. Proposed change:
Reorder sections to put developer-oriented sections at the bottom,
including the info on nightly builds:
Download Spark
Link with Spark
All Releases
Spark Source C
21 matches
Mail list logo