Cool. Using Ambari to monitor and scale up/down the cluster sounds
promising. Thanks for the pointer!
Mingyu
From: Deepak Sharma
Date: Monday, December 14, 2015 at 1:53 AM
To: cs user
Cc: Mingyu Kim , "user@spark.apache.org"
Subject: Re: Autoscaling of Spark YARN cluster
An
review², and I didn¹t find much
else from my search.
This might be a general YARN question, but wanted to check if there¹s a
solution popular in the Spark community. Any sharing of experience around
autoscaling will be helpful!
Thanks,
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
/SPARK-3996. Would
this be reasonable?
Mingyu
On 10/7/15, 11:26 AM, "Marcelo Vanzin" wrote:
>Seems like you might be running into
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D10910&d=CQIBaQ&c=izlc9mHr637UR4lpLEZLF
Cool, we will start from there. Thanks Aaron and Josh!
Darin, it¹s likely because the DirectOutputCommitter is compiled with
Hadoop 1 classes and you¹re running it with Hadoop 2.
org.apache.hadoop.mapred.JobContext used to be a class in Hadoop 1, and it
became an interface in Hadoop 2.
Mingyu
I didn’t get any response. It’d be really appreciated if anyone using a special
OutputCommitter for S3 can comment on this!
Thanks,
Mingyu
From: Mingyu Kim mailto:m...@palantir.com>>
Date: Monday, February 16, 2015 at 1:15 AM
To: "user@spark.apache.org<mailto:user@sp
n with Spark.
Thanks,
Mingyu
I found a workaround.
I can make my auxiliary data a RDD. Partition it and cache it.
Later, I can cogroup it with other RDDs and Spark will try to keep the
cached RDD partitions where they are and not shuffle them.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabbl
Also, Setting spark.locality.wait=100 did not work for me.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-partition-sticky-i-e-stay-with-node-tp21322p21325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
mount partition specific auxiliary
data for processing the stream. I noticed that the partitions move among the
nodes. I cannot afford to move the large auxiliary data around.
Thanks,
Mingyu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-spark-part
Ok, cool. This seems to be general issues in JVM with very large heaps. I
agree that the best workaround would be to keep the heap size below 32GB.
Thanks guys!
Mingyu
From: Arun Ahuja
Date: Monday, October 6, 2014 at 7:50 AM
To: Andrew Ash
Cc: Mingyu Kim , "user@spark.apache.org"
heap.
(I.e. spark.executor.memoty=50g) And, when we checked the CPU usage, there
were just a lot of GCs going on.
Has anyone seen a similar problem?
Thanks,
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
That makes sense. Thanks everyone for the explanations!
Mingyu
From: Matei Zaharia
Reply-To: "user@spark.apache.org"
Date: Tuesday, July 15, 2014 at 3:00 PM
To: "user@spark.apache.org"
Subject: Re: How does Spark speculation prevent duplicated work?
Yeah, this is ha
level?
Mingyu
From: Bertrand Dechoux
Reply-To: "user@spark.apache.org"
Date: Tuesday, July 15, 2014 at 1:22 PM
To: "user@spark.apache.org"
Subject: Re: How does Spark speculation prevent duplicated work?
I haven't look at the implementation but what you would
actions are
not idempotent. For example, it may be counting a partition twice in case of
RDD.count or may be writing a partition to HDFS twice in case of
RDD.save*(). How does it prevent this kind of duplicated work?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
.scala:1207)
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispa
> tcher.scala:386)
>
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:133
> 9)
>
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:1
> 07)
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
Cool. Thanks for the note. Looking forward to it.
Mingyu
From: Andrew Ash
Reply-To: "user@spark.apache.org"
Date: Friday, June 20, 2014 at 9:54 AM
To: "user@spark.apache.org"
Subject: Re: 1.0.1 release plan
Sounds good. Mingyu and I are waiting on 1.0.1 to get t
Hi all,
Is there any plan for 1.0.1 release?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
.
(and, sort is really expensive.) On the other hand, if I can assume, say,
“filter” or “map” doesn’t shuffle the rows around, I can do the sort once
and assume that the order is retained throughout such operations saving a
lot of time from doing unnecessary sorts.
Mingyu
From: Mark Hamstra
Reply
Okay, that makes sense. It’d be great if this can be better documented at
some point, because the only way to find out about the resulting RDD row
order is by looking at the code.
Thanks for the discussion!
Mingyu
On 4/29/14, 11:59 PM, "Patrick Wendell" wrote:
>I don't
union two RDDs, for example, rdd1 = [“a, b,
c”], rdd2 = [“1, 2, 3”, “4, 5, 6”], then
rdd1.union(rdd2).saveAsTextFile(…) should’ve resulted in a file with three
lines “a, b, c”, “1, 2, 3”, and “4, 5, 6” because the partitions from the
two reds are concatenated.
Mingyu
On 4/29/14, 10:55 PM
’m not
sure why union doesn’t respect the order because union operation simply
concatenates the two lists of partitions from the two RDDs.
Mingyu
On 4/29/14, 10:25 PM, "Patrick Wendell" wrote:
>You are right, once you sort() the RDD, then yes it has a well defined
>ordering.
&
() because map
preserves the partition order. RDD order is also what allows me to get the
top k out of RDD by doing RDD.sort().take().
Am I misunderstanding it? Or, is it just when RDD is written to disk that
the order is not well preserved? Thanks in advance!
Mingyu
On 1/22/14, 4:46 PM, "Pa
design? Is this a bug?
Mingyu
smime.p7s
Description: S/MIME cryptographic signature
23 matches
Mail list logo