Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Qiao, Richard
For your question of example, the answer is yes. “For example, if an application wanted 4 executors (spark.executor.instances=4) but the spark cluster can only provide 1 executor. This means that I will only receive 1 onExecutorAdded event. Will the application state change to RUNNI

Re: Programmatically get status of job (WAITING/RUNNING)

2017-12-07 Thread Qiao, Richard
For #2, do you mean “RUNNING” showing in “Driver” table? If yes, that is not a problem, because driver does run, while there is no executor available, as can be a status for you to catch – Driver running while no executors. Comparing #1 and #3, my understanding of “submitted” is “the jar is submi

Re: Do I need to do .collect inside forEachRDD

2017-12-07 Thread Qiao, Richard
: "Qiao, Richard" Cc: Gerard Maas , "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD Hi Richard, I had tried your sample code now and several times in the past as well. The problem seems to be kafkaProducer is not serializable. so I get "Task not seria

Re: Do I need to do .collect inside forEachRDD

2017-12-07 Thread Qiao, Richard
ali Date: Thursday, December 7, 2017 at 2:30 AM To: Gerard Maas Cc: "Qiao, Richard" , "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD @Richard I had pasted the two versions of the code below and I still couldn't figure out why it wouldn'

Re: unable to connect to connect to cluster 2.2.0

2017-12-06 Thread Qiao, Richard
Are you now building your app using spark 2.2 or 2.1? Best Regards Richard From: Imran Rajjad Date: Wednesday, December 6, 2017 at 2:45 AM To: "user @spark" Subject: unable to connect to connect to cluster 2.2.0 Hi, Recently upgraded from 2.1.1 to 2.2.0. My Streaming job seems to have broken

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard
In the 2nd case, is there any producer’s error thrown in executor’s log? Best Regards Richard From: kant kodali Date: Tuesday, December 5, 2017 at 4:38 PM To: "Qiao, Richard" Cc: "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD Reads from Kafka and

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard
Where do you check the output result for both case? Sent from my iPhone > On Dec 5, 2017, at 15:36, kant kodali wrote: > > Hi All, > > I have a simple stateless transformation using Dstreams (stuck with the old > API for one of the Application). The pseudo code is rough like this > > dstream

Re: Access to Applications metrics

2017-12-04 Thread Qiao, Richard
It works to collect Job level, through Jolokia java agent. Best Regards Richard From: Nick Dimiduk Date: Monday, December 4, 2017 at 6:53 PM To: "user@spark.apache.org" Subject: Re: Access to Applications metrics Bump. On Wed, Nov 15, 2017 at 2:28 PM, Nick Dimiduk mailto:ndimi...@gmail.com>

Re: Add snappy support for spark in Windows

2017-12-04 Thread Qiao, Richard
Junjeng, it worth a try to start your spark local with hadoop.dll/winutils.exe etc hadoop windows support package in HADOOP_HOME, if you didn’t do that yet. Best Regards Richard From: Junfeng Chen Date: Monday, December 4, 2017 at 3:53 AM To: "Qiao, Richard" Cc: "user@s

Re: Add snappy support for spark in Windows

2017-12-04 Thread Qiao, Richard
It seems a common mistake that the path is not accessible by workers/executors. Best regards Richard Sent from my iPhone On Dec 3, 2017, at 22:32, Junfeng Chen mailto:darou...@gmail.com>> wrote: I am working on importing snappy compressed json file into spark rdd or dataset. However I meet t

Re: Dynamic Resource allocation in Spark Streaming

2017-12-03 Thread Qiao, Richard
Sourav: I’m using spark streaming 2.1.0 and can confirm spark.dynamicAllocation.enabled is enough. Best Regards Richard From: Sourav Mazumder Date: Sunday, December 3, 2017 at 12:31 PM To: user Subject: Dynamic Resource allocation in Spark Streaming Hi, I see the following jir

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Qiao, Richard
In your case, it looks it’s trying to make 2 versions Kafka existed in the same JVM at runtime. There is version conflict. About “I dont find the spark async commit useful for our needs”, do you mean to say the code like below? kafkaDStream.asInstanceOf[CanCommitOffsets].commitAsync(ranges) B