how can I get the application belong to the driver?

2016-12-26 Thread John Fang
I hope I can get the application by the driverId, but I don't find the rest api at spark。Then how can i get the application, which belong to one driver。

RE: Shuffle intermidiate results not being cached

2016-12-26 Thread assaf.mendelson
The reason I thought some operations would be reused is the fact that spark automatically caches shuffle data which means the partial aggregation for pervious dataframes would be saved. Unfortunatly, as Mark Hamstra explained this is not the case because this is considered a new RDD and therefor

Spark on mesos, it seemed spark dispatcher didn't abort when authorization failed

2016-12-26 Thread Yu Wei
Hi Guys, When running some cases about spark on mesos, it seemed that spark dispatcher didn't abort when authorization failed. It seemed that spark dispatcher detected the error but did not handle it properly. The detailed log is as below, 16/12/26 16:02:08 INFO Utils: Successfully started s

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Chawla,Sumit
What is the expected effect of reducing the mesosExecutor.cores to zero? What functionality of executor is impacted? Is the impact is just that it just behaves like a regular process? Regards Sumit Chawla On Mon, Dec 26, 2016 at 9:25 AM, Michael Gummelt wrote: > > Using 0 for spark.mesos.mesos

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Thanks a LOT, Michael! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Dec 26, 2016 at 10:04 PM, Michael Gummelt wrote: > In fine-grained mode (which is

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Michael Gummelt
In fine-grained mode (which is deprecated), Spark tasks (which are threads) were implemented as Mesos tasks. When a Mesos task starts and stops, its underlying cgroup, and therefore the resources its consuming on the cluster, grows or shrinks based on the resources allocated to the tasks, which in

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Hi Michael, That caught my attention... Could you please elaborate on "elastically grow and shrink CPU usage" and how it really works under the covers? It seems that CPU usage is just a "label" for an executor on Mesos. Where's this in the code? Pozdrawiam, Jacek Laskowski https://medium.co

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Michael Gummelt
> Using 0 for spark.mesos.mesosExecutor.cores is better than dynamic allocation Maybe for CPU, but definitely not for memory. Executors never shut down in fine-grained mode, which means you only elastically grow and shrink CPU usage, not memory. On Sat, Dec 24, 2016 at 10:14 PM, Davies Liu wrot

Re: Shuffle intermidiate results not being cached

2016-12-26 Thread Liang-Chi Hsieh
Hi, Let me quote your example codes: var totalTime: Long = 0 var allDF: org.apache.spark.sql.DataFrame = null for { x <- dataframes } { val timeLen = time { allDF = if (allDF == null) x else allDF.union(x) val grouped = allDF.groupBy("cat1", "cat2").agg(sum($"valToAdd").alias("v"))

Re: Sharing data in columnar storage between two applications

2016-12-26 Thread Mark Hamstra
Yes, this is part of Matei's current research, for which code is not yet publicly available at all, much less in a form suitable for production use. On Mon, Dec 26, 2016 at 2:29 AM, Evan Chan wrote: > Looks pretty interesting, but might take a while honestly. > > On Dec 25, 2016, at 5:24 PM, Mar

Re: Shuffle intermidiate results not being cached

2016-12-26 Thread Mark Hamstra
Shuffle results are only reused if you are reusing the exact same RDD. If you are working with Dataframes that you have not explicitly cached, then they are going to be producing new RDDs within their physical plan creation and evaluation, so you won't get implicit shuffle reuse. This is what htt

Shuffle intermidiate results not being cached

2016-12-26 Thread assaf.mendelson
Hi, Sorry to be bothering everyone on the holidays but I have found what may be a bug. I am doing a "manual" streaming (see http://stackoverflow.com/questions/41266956/apache-spark-streaming-performance for the specific code) where I essentially read an additional dataframe each time from fil

Re: Spark structured steaming from kafka - last message processed again after resume from checkpoint

2016-12-26 Thread Niek
Cool thanks! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-structured-steaming-from-kafka-last-message-processed-again-after-resume-from-checkpoint-tp20353p20357.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com