Yes. Thanks for your clarification. The problem I encountered is that in yarn cluster mode, no output for "DStream.print()" in yarn logs.
In spark implementation org/apache/spark/streaming/dstream/DStream.scala, the logs related with "Time" was printed out. However, other information for firstNum.take(num).foreach(println) was not printed in logs. What's the root cause for the behavior difference? /** * Print the first ten elements of each RDD generated in this DStream. This is an output * operator, so this DStream will be registered as an output stream and there materialized. */ def print(): Unit = ssc.withScope { print(10) } /** * Print the first num elements of each RDD generated in this DStream. This is an output * operator, so this DStream will be registered as an output stream and there materialized. */ def print(num: Int): Unit = ssc.withScope { def foreachFunc: (RDD[T], Time) => Unit = { (rdd: RDD[T], time: Time) => { val firstNum = rdd.take(num + 1) // scalastyle:off println println("-------------------------------------------") println("Time: " + time) println("-------------------------------------------") firstNum.take(num).foreach(println) if (firstNum.length > num) println("...") println() // scalastyle:on println } } Thanks, Jared ________________________________ From: Rabin Banerjee <dev.rabin.baner...@gmail.com> Sent: Thursday, July 7, 2016 1:04 PM To: Yu Wei Cc: Mich Talebzadeh; Deng Ching-Mallete; user@spark.apache.org Subject: Re: Is that possible to launch spark streaming application on yarn with only one machine? In yarn cluster mode , Driver is running in AM , so you can find the logs in that AM log . Open rersourcemanager UI , and check for the Job and logs. or yarn logs -applicationId <appId> In yarn client mode , the driver is the same JVM from where you are launching ,,So you are getting it in the log . On Thu, Jul 7, 2016 at 7:56 AM, Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote: Launching via client deploy mode, it works again. I'm still a little confused about the behavior difference for cluster and client mode on a single machine. Thanks, Jared ________________________________ From: Mich Talebzadeh <mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> Sent: Wednesday, July 6, 2016 9:46:11 PM To: Yu Wei Cc: Deng Ching-Mallete; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Is that possible to launch spark streaming application on yarn with only one machine? Deploy-mode cluster don't think will work. Try --master yarn --deploy-mode client FYI * Spark Local - Spark runs on the local host. This is the simplest set up and best suited for learners who want to understand different concepts of Spark and those performing unit testing. * Spark Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster. * YARN Cluster Mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. This is invoked with –master yarn and --deploy-mode cluster * YARN Client Mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. Unlike Spark standalone mode, in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the --master parameter is yarn. This is invoked with --deploy-mode client HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 6 July 2016 at 12:31, Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote: Hi Deng, I tried the same code again. It seemed that when launching application via yarn on single node, JavaDStream.print() did not work. However, occasionally it worked. If launch the same application in local mode, it always worked. The code is as below, SparkConf conf = new SparkConf().setAppName("Monitor&Control"); JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1)); JavaReceiverInputDStream<String> inputDS = MQTTUtils.createStream(jssc, "tcp://114.55.145.185:1883<http://114.55.145.185:1883>", "Control"); inputDS.print(); jssc.start(); jssc.awaitTermination(); Command for launching via yarn, (did not work) spark-submit --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g target/CollAna-1.0-SNAPSHOT.jar Command for launching via local mode (works) spark-submit --master local[4] --driver-memory 4g --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar Any advice? Thanks, Jared ________________________________ From: Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>> Sent: Tuesday, July 5, 2016 4:41 PM To: Deng Ching-Mallete Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Is that possible to launch spark streaming application on yarn with only one machine? Hi Deng, Thanks for the help. Actually I need pay more attention to memory usage. I found the root cause in my problem. It seemed that it existed in spark streaming MQTTUtils module. When I use "localhost" in brokerURL, it doesn't work. After change it to "127.0.0.1", it works now. Thanks again, Jared ________________________________ From: odeach...@gmail.com<mailto:odeach...@gmail.com> <odeach...@gmail.com<mailto:odeach...@gmail.com>> on behalf of Deng Ching-Mallete <och...@apache.org<mailto:och...@apache.org>> Sent: Tuesday, July 5, 2016 4:03:28 PM To: Yu Wei Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Is that possible to launch spark streaming application on yarn with only one machine? Hi Jared, You can launch a Spark application even with just a single node in YARN, provided that the node has enough resources to run the job. It might also be good to note that when YARN calculates the memory allocation for the driver and the executors, there is an additional memory overhead that is added for each executor then it gets rounded up to the nearest GB, IIRC. So the 4G driver-memory + 4x2G executor memory do not necessarily translate to a total of 12G memory allocation. It would be more than that, so the node would need to have more than 12G of memory for the job to execute in YARN. You should be able to see something like "No resources available in cluster.." in the application master logs in YARN if that is the case. HTH, Deng On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote: Hi guys, I set up pseudo hadoop/yarn cluster on my labtop. I wrote a simple spark streaming program as below to receive messages with MQTTUtils. conf = new SparkConf().setAppName("Monitor&Control"); jssc = new JavaStreamingContext(conf, Durations.seconds(1)); JavaReceiverInputDStream<String> inputDS = MQTTUtils.createStream(jssc, brokerUrl, topic); inputDS.print(); jssc.start(); jssc.awaitTermination() If I submitted the app with "--master local[2]", it works well. spark-submit --master local[4] --driver-memory 4g --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar If I submitted with "--master yarn", no output for "inputDS.print()". spark-submit --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar Is it possible to launch spark application on yarn with only one single node? Thanks for your advice. Jared