In that case, I suspect that Mqtt is not getting data while you are submitting in yarn cluster .
Can you please try dumping data in text file instead of printing while submitting in yarn cluster mode.? On Jul 7, 2016 12:46 PM, "Yu Wei" <yu20...@hotmail.com> wrote: > Yes. Thanks for your clarification. > > The problem I encountered is that in yarn cluster mode, no output for > "DStream.print()" in yarn logs. > > > In spark implementation org/apache/spark/streaming/dstream/DStream.scala, > the logs related with "Time" was printed out. However, other information > for firstNum.take(num).foreach(println) was not printed in logs. > > What's the root cause for the behavior difference? > > > /** > * Print the first ten elements of each RDD generated in this DStream. > This is an output > * operator, so this DStream will be registered as an output stream and > there materialized. > */ > def print(): Unit = ssc.withScope { > print(10) > } > > /** > * Print the first num elements of each RDD generated in this DStream. > This is an output > * operator, so this DStream will be registered as an output stream and > there materialized. > */ > def print(num: Int): Unit = ssc.withScope { > def foreachFunc: (RDD[T], Time) => Unit = { > (rdd: RDD[T], time: Time) => { > val firstNum = rdd.take(num + 1) > // scalastyle:off println > println("-------------------------------------------") > println("Time: " + time) > println("-------------------------------------------") > firstNum.take(num).foreach(println) > if (firstNum.length > num) println("...") > println() > // scalastyle:on println > } > } > > Thanks, > > Jared > > > ------------------------------ > *From:* Rabin Banerjee <dev.rabin.baner...@gmail.com> > *Sent:* Thursday, July 7, 2016 1:04 PM > *To:* Yu Wei > *Cc:* Mich Talebzadeh; Deng Ching-Mallete; user@spark.apache.org > *Subject:* Re: Is that possible to launch spark streaming application on > yarn with only one machine? > > In yarn cluster mode , Driver is running in AM , so you can find the logs > in that AM log . Open rersourcemanager UI , and check for the Job and logs. > or yarn logs -applicationId <appId> > > In yarn client mode , the driver is the same JVM from where you are > launching ,,So you are getting it in the log . > > On Thu, Jul 7, 2016 at 7:56 AM, Yu Wei <yu20...@hotmail.com> wrote: > >> Launching via client deploy mode, it works again. >> >> I'm still a little confused about the behavior difference for cluster and >> client mode on a single machine. >> >> >> Thanks, >> >> Jared >> ------------------------------ >> *From:* Mich Talebzadeh <mich.talebza...@gmail.com> >> *Sent:* Wednesday, July 6, 2016 9:46:11 PM >> *To:* Yu Wei >> *Cc:* Deng Ching-Mallete; user@spark.apache.org >> >> *Subject:* Re: Is that possible to launch spark streaming application on >> yarn with only one machine? >> >> Deploy-mode cluster don't think will work. >> >> Try --master yarn --deploy-mode client >> >> FYI >> >> >> - >> >> *Spark Local* - Spark runs on the local host. This is the simplest >> set up and best suited for learners who want to understand different >> concepts of Spark and those performing unit testing. >> - >> >> *Spark Standalone *– a simple cluster manager included with Spark >> that makes it easy to set up a cluster. >> - >> >> *YARN Cluster Mode,* the Spark driver runs inside an application >> master process which is managed by YARN on the cluster, and the client can >> go away after initiating the application. This is invoked with –master >> yarn and --deploy-mode cluster >> - >> >> *YARN Client Mode*, the driver runs in the client process, and the >> application master is only used for requesting resources from YARN. >> Unlike Spark >> standalone mode, in which the master’s address is specified in the >> --master parameter, in YARN mode the ResourceManager’s address is >> picked up from the Hadoop configuration. Thus, the --master parameter >> is yarn. This is invoked with --deploy-mode client >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 6 July 2016 at 12:31, Yu Wei <yu20...@hotmail.com> wrote: >> >>> Hi Deng, >>> >>> I tried the same code again. >>> >>> It seemed that when launching application via yarn on single node, >>> JavaDStream.print() did not work. However, occasionally it worked. >>> >>> If launch the same application in local mode, it always worked. >>> >>> >>> The code is as below, >>> >>> SparkConf conf = new SparkConf().setAppName("Monitor&Control"); >>> JavaStreamingContext jssc = new JavaStreamingContext(conf, >>> Durations.seconds(1)); >>> JavaReceiverInputDStream<String> inputDS = >>> MQTTUtils.createStream(jssc, "tcp://114.55.145.185:1883", "Control"); >>> inputDS.print(); >>> jssc.start(); >>> jssc.awaitTermination(); >>> >>> >>> Command for launching via yarn, (did not work) >>> >>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g >>> --executor-memory 2g target/CollAna-1.0-SNAPSHOT.jar >>> Command for launching via local mode (works) >>> spark-submit --master local[4] --driver-memory 4g --executor-memory >>> 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>> >>> >>> >>> Any advice? >>> >>> >>> Thanks, >>> >>> Jared >>> >>> >>> >>> ------------------------------ >>> *From:* Yu Wei <yu20...@hotmail.com> >>> *Sent:* Tuesday, July 5, 2016 4:41 PM >>> *To:* Deng Ching-Mallete >>> >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: Is that possible to launch spark streaming application >>> on yarn with only one machine? >>> >>> >>> Hi Deng, >>> >>> >>> Thanks for the help. Actually I need pay more attention to memory usage. >>> >>> I found the root cause in my problem. It seemed that it existed in spark >>> streaming MQTTUtils module. >>> >>> When I use "localhost" in brokerURL, it doesn't work. >>> >>> After change it to "127.0.0.1", it works now. >>> >>> >>> Thanks again, >>> >>> Jared >>> >>> >>> >>> ------------------------------ >>> *From:* odeach...@gmail.com <odeach...@gmail.com> on behalf of Deng >>> Ching-Mallete <och...@apache.org> >>> *Sent:* Tuesday, July 5, 2016 4:03:28 PM >>> *To:* Yu Wei >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: Is that possible to launch spark streaming application >>> on yarn with only one machine? >>> >>> Hi Jared, >>> >>> You can launch a Spark application even with just a single node in YARN, >>> provided that the node has enough resources to run the job. >>> >>> It might also be good to note that when YARN calculates the memory >>> allocation for the driver and the executors, there is an additional memory >>> overhead that is added for each executor then it gets rounded up to the >>> nearest GB, IIRC. So the 4G driver-memory + 4x2G executor memory do not >>> necessarily translate to a total of 12G memory allocation. It would be more >>> than that, so the node would need to have more than 12G of memory for the >>> job to execute in YARN. You should be able to see something like "No >>> resources available in cluster.." in the application master logs in YARN if >>> that is the case. >>> >>> HTH, >>> Deng >>> >>> On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei <yu20...@hotmail.com> wrote: >>> >>>> Hi guys, >>>> >>>> I set up pseudo hadoop/yarn cluster on my labtop. >>>> >>>> I wrote a simple spark streaming program as below to receive messages >>>> with MQTTUtils. >>>> conf = new SparkConf().setAppName("Monitor&Control"); >>>> jssc = new JavaStreamingContext(conf, Durations.seconds(1)); >>>> JavaReceiverInputDStream<String> inputDS = MQTTUtils.createStream(jssc, >>>> brokerUrl, topic); >>>> >>>> inputDS.print(); >>>> jssc.start(); >>>> jssc.awaitTermination() >>>> >>>> If I submitted the app with "--master local[2]", it works well. >>>> >>>> spark-submit --master local[4] --driver-memory 4g --executor-memory 2g >>>> --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>>> >>>> If I submitted with "--master yarn", no output for "inputDS.print()". >>>> >>>> spark-submit --master yarn --deploy-mode cluster --driver-memory 4g >>>> --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar >>>> >>>> Is it possible to launch spark application on yarn with only one single >>>> node? >>>> >>>> >>>> Thanks for your advice. >>>> >>>> >>>> Jared >>>> >>>> >>>> >>> >> >