I hope I can get the application by the driverId, but I don't find the rest api
at spark。Then how can i get the application, which belong to one driver。
The reason I thought some operations would be reused is the fact that spark
automatically caches shuffle data which means the partial aggregation for
pervious dataframes would be saved. Unfortunatly, as Mark Hamstra explained
this is not the case because this is considered a new RDD and therefor
Hi Guys,
When running some cases about spark on mesos, it seemed that spark dispatcher
didn't abort when authorization failed.
It seemed that spark dispatcher detected the error but did not handle it
properly.
The detailed log is as below,
16/12/26 16:02:08 INFO Utils: Successfully started s
What is the expected effect of reducing the mesosExecutor.cores to zero?
What functionality of executor is impacted? Is the impact is just that it
just behaves like a regular process?
Regards
Sumit Chawla
On Mon, Dec 26, 2016 at 9:25 AM, Michael Gummelt
wrote:
> > Using 0 for spark.mesos.mesos
Thanks a LOT, Michael!
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Mon, Dec 26, 2016 at 10:04 PM, Michael Gummelt
wrote:
> In fine-grained mode (which is
In fine-grained mode (which is deprecated), Spark tasks (which are threads)
were implemented as Mesos tasks. When a Mesos task starts and stops, its
underlying cgroup, and therefore the resources its consuming on the
cluster, grows or shrinks based on the resources allocated to the tasks,
which in
Hi Michael,
That caught my attention...
Could you please elaborate on "elastically grow and shrink CPU usage"
and how it really works under the covers? It seems that CPU usage is
just a "label" for an executor on Mesos. Where's this in the code?
Pozdrawiam,
Jacek Laskowski
https://medium.co
> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic
allocation
Maybe for CPU, but definitely not for memory. Executors never shut down in
fine-grained mode, which means you only elastically grow and shrink CPU
usage, not memory.
On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu wrot
Hi,
Let me quote your example codes:
var totalTime: Long = 0
var allDF: org.apache.spark.sql.DataFrame = null
for {
x <- dataframes
} {
val timeLen = time {
allDF = if (allDF == null) x else allDF.union(x)
val grouped = allDF.groupBy("cat1",
"cat2").agg(sum($"valToAdd").alias("v"))
Yes, this is part of Matei's current research, for which code is not yet
publicly available at all, much less in a form suitable for production use.
On Mon, Dec 26, 2016 at 2:29 AM, Evan Chan wrote:
> Looks pretty interesting, but might take a while honestly.
>
> On Dec 25, 2016, at 5:24 PM, Mar
Shuffle results are only reused if you are reusing the exact same RDD. If
you are working with Dataframes that you have not explicitly cached, then
they are going to be producing new RDDs within their physical plan creation
and evaluation, so you won't get implicit shuffle reuse. This is what
htt
Hi,
Sorry to be bothering everyone on the holidays but I have found what may be a
bug.
I am doing a "manual" streaming (see
http://stackoverflow.com/questions/41266956/apache-spark-streaming-performance
for the specific code) where I essentially read an additional dataframe each
time from fil
Cool thanks!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-structured-steaming-from-kafka-last-message-processed-again-after-resume-from-checkpoint-tp20353p20357.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
13 matches
Mail list logo