Re: Zeppelin - Spark Driver location

Jeff Zhang Wed, 14 Mar 2018 16:39:40 -0700

spark-submit would only run when you run the first paragraph using spark
interpreter. After that, paragraph would send code to the spark app to
execute.


>>> Also spark standalone cluster moder should work even before this new
release, right?
I didn't verify that, not sure whether other people veryfit.


ankit jain <[email protected]>于2018年3月15日周四 上午4:32写道：

> Also spark standalone cluster moder should work even before this new
> release, right?
>
> On Wed, Mar 14, 2018 at 8:43 AM, ankit jain <[email protected]>
> wrote:
>
>> Hi Jhang,
>> Not clear on that - I thought spark-submit was done when we run a
>> paragraph, how does the .sh file come into play?
>>
>> Thanks
>> Ankit
>>
>> On Tue, Mar 13, 2018 at 5:43 PM, Jeff Zhang <[email protected]> wrote:
>>
>>>
>>> spark-submit is called in bin/interpreter.sh,  I didn't try standalone
>>> cluster mode. It is expected to run driver in separate host, but didn't
>>> guaranteed zeppelin support this.
>>>
>>> Ankit Jain <[email protected]>于2018年3月14日周三 上午8:34写道：
>>>
>>>> Hi Jhang,
>>>> What is the expected behavior with standalone cluster mode? Should we
>>>> see separate driver processes in the cluster(one per user) or multiple
>>>> SparkSubmit processes?
>>>>
>>>> I was trying to dig in Zeppelin code & didn’t see where Zeppelin does
>>>> the Spark-submit to the cluster? Can you please point to it?
>>>>
>>>> Thanks
>>>> Ankit
>>>>
>>>> On Mar 13, 2018, at 5:25 PM, Jeff Zhang <[email protected]> wrote:
>>>>
>>>>
>>>> ZEPPELIN-2898 <https://issues.apache.org/jira/browse/ZEPPELIN-2898> is
>>>> for yarn cluster model.  And Zeppelin have integration test for yarn mode,
>>>> so guaranteed it would work. But don't' have test for standalone, so not
>>>> sure the behavior of standalone mode.
>>>>
>>>>
>>>> Ruslan Dautkhanov <[email protected]>于2018年3月14日周三 上午8:06写道：
>>>>
>>>>> https://github.com/apache/zeppelin/pull/2577 pronounces yarn-cluster
>>>>> in it's title so I assume it's only yarn-cluster.
>>>>> Never used standalone-cluster myself.
>>>>>
>>>>> Which distro of Hadoop do you use?
>>>>> Cloudera desupported standalone in CDH 5.5 and will remove in CDH 6.
>>>>>
>>>>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_deprecated.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>> On Tue, Mar 13, 2018 at 5:45 PM, Jhon Anderson Cardenas Diaz <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Does this new feature work only for yarn-cluster ?. Or for spark
>>>>>> standalone too ?
>>>>>>
>>>>> El mar., 13 de mar. de 2018 18:34, Ruslan Dautkhanov <
>>>>>> [email protected]> escribió:
>>>>>>
>>>>> > Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2898 was merged end
>>>>>>> of September so not sure if you have that.
>>>>>>>
>>>>>>> Check out
>>>>>>> https://medium.com/@zjffdu/zeppelin-0-8-0-new-features-ea53e8810235
>>>>>>> how to set this up.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ruslan Dautkhanov
>>>>>>>
>>>>>>> On Tue, Mar 13, 2018 at 5:24 PM, Jhon Anderson Cardenas Diaz <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>> Hi zeppelin users !
>>>>>>>>
>>>>>>>> I am working with zeppelin pointing to a spark in standalone. I am
>>>>>>>> trying to figure out a way to make zeppelin runs the spark driver 
>>>>>>>> outside
>>>>>>>> of client process that submits the application.
>>>>>>>>
>>>>>>>> According with the documentation (
>>>>>>>> http://spark.apache.org/docs/2.1.1/spark-standalone.html):
>>>>>>>>
>>>>>>>> *For standalone clusters, Spark currently supports two deploy
>>>>>>>> modes. In client mode, the driver is launched in the same process as 
>>>>>>>> the
>>>>>>>> client that submits the application. In cluster mode, however, the 
>>>>>>>> driver
>>>>>>>> is launched from one of the Worker processes inside the cluster, and 
>>>>>>>> the
>>>>>>>> client process exits as soon as it fulfills its responsibility of
>>>>>>>> submitting the application without waiting for the application to 
>>>>>>>> finish.*
>>>>>>>>
>>>>>>>> The problem is that, even when I set the properties for
>>>>>>>> spark-standalone cluster and deploy mode in cluster, the driver still 
>>>>>>>> run
>>>>>>>> inside zeppelin machine (according with spark UI/executors page). 
>>>>>>>> These are
>>>>>>>> properties that I am setting for the spark interpreter:
>>>>>>>>
>>>>>>>> master: spark://<master-name>:7077
>>>>>>>> spark.submit.deployMode: cluster
>>>>>>>> spark.executor.memory: 16g
>>>>>>>>
>>>>>>>> Any ideas would be appreciated.
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Details:
>>>>>>>> Spark version: 2.1.1
>>>>>>>> Zeppelin version: 0.8.0 (merged at September 2017 version)
>>>>>>>>
>>>>>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ankit.
>>
>
>
>
> --
> Thanks & Regards,
> Ankit.
>

Re: Zeppelin - Spark Driver location

Reply via email to