Re: Apache Spark master value question

Jeff Zhang Sun, 17 May 2020 23:56:46 -0700

The env name in interpreter.json and interpreter-setting.json is not used.
We should remove them.


I still don't understand how master & spark.master would effect the
behavior. `master` is a legacy stuff that we introduced very long time ago,
we definitely should use spark.master instead. But actually internally we
do translate master to spark.master, so not sure why it would cause this
issue, maybe it is some bugs.



Alex Ott <[email protected]> 于2020年5月17日周日 下午9:36写道：

> I've seen somewhere in CDH documentation that they use MASTER, that's why
> I'm asking...
>
> On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <[email protected]>
> wrote:
>
>> Thanks a lot for creating the issue. It seems I am not allowed to.
>>
>> As I understand it, the environment variable is supposed to be
>> SPARK_MASTER.
>>
>> On Sun, May 17, 2020 at 11:56 AM Alex Ott <[email protected]> wrote:
>>
>>> Ok, I've created a JIRA for it:
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
>>>
>>> I'm not sure about environment variable name - it's simply MASTER,
>>> should it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
>>> distributions to have it as MASTER?
>>>
>>> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <[email protected]>
>>> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> Thanks a lot for helping out with this.
>>>>
>>>> You're correct, but it doesn't seem that it's the
>>>> interpreter-settings.json for Spark interpreter that is being used. It's
>>>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>>>> ```partial-json
>>>>     "spark": {
>>>>       "id": "spark",
>>>>       "name": "spark",
>>>>       "group": "spark",
>>>>       "properties": {
>>>>         "SPARK_HOME": {
>>>>           "name": "SPARK_HOME",
>>>>           "value": "",
>>>>           "type": "string",
>>>>           "description": "Location of spark distribution"
>>>>         },
>>>>         "master": {
>>>>           "name": "master",
>>>>           "value": "local[*]",
>>>>           "type": "string",
>>>>           "description": "Spark master uri. local | yarn-client |
>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>> spark://master_host:7077"
>>>>         },
>>>> ```
>>>> That "master" should be "spark.master".
>>>>
>>>> By adding an explicit spark.master with the value "local[*]" I can use
>>>> all cores as expected. Without this and printing sc.master I get "local".
>>>> With the addition of the spark.master property set to "local[*]" and
>>>> printing sc.master I get "local[*]". My conclusion is
>>>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>>>> for Spark interpreter.
>>>>
>>>> Best regards,
>>>> Patrik Iselind
>>>>
>>>>
>>>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <[email protected]> wrote:
>>>>
>>>>> Spark master is set to `local[*]` by default. Here is corresponding
>>>>> piece
>>>>> form interpreter-settings.json for Spark interpreter:
>>>>>
>>>>>       "master": {
>>>>>         "envName": "MASTER",
>>>>>         "propertyName": "spark.master",
>>>>>         "defaultValue": "local[*]",
>>>>>         "description": "Spark master uri. local | yarn-client |
>>>>> yarn-cluster | spark master address of standalone mode, ex)
>>>>> spark://master_host:7077",
>>>>>         "type": "string"
>>>>>       },
>>>>>
>>>>>
>>>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>>>  PI> Hi Jeff,
>>>>>
>>>>>  PI> I've tried the release from
>>>>> http://zeppelin.apache.org/download.html, both in a docker and
>>>>> without a docker. They both have the same issue as
>>>>>  PI> previously described.
>>>>>
>>>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>>>>> using some environment variable?
>>>>>
>>>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be
>>>>> released?
>>>>>
>>>>>  PI> Best Regards,
>>>>>  PI> Patrik Iselind
>>>>>
>>>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>  PI>     Hi Patric,
>>>>>  PI>
>>>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>>>>> docker container.
>>>>>  PI>
>>>>>  PI>     http://zeppelin.apache.org/download.html
>>>>>
>>>>>  PI>     Patrik Iselind <[email protected]> 于2020年5月10日周日上午2:30写道：
>>>>>  PI>
>>>>>  PI>         Hello Jeff,
>>>>>  PI>
>>>>>  PI>         Thank you for looking into this for me.
>>>>>  PI>
>>>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>>>> has
>>>>>  PI>         the digest "apache/zeppelin@sha256
>>>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>>>  PI>
>>>>>  PI>         If it's not on the tip of master, could you guys please
>>>>> release a newer 0.9.0 image?
>>>>>  PI>
>>>>>  PI>         Best Regards,
>>>>>  PI>         Patrik Iselind
>>>>>
>>>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <
>>>>> [email protected]> wrote:
>>>>>  PI>
>>>>>  PI>             This might be a bug of 0.8, I tried that in 0.9
>>>>> (master branch), it works for me.
>>>>>  PI>
>>>>>  PI>             print(sc.master)
>>>>>  PI>             print(sc.defaultParallelism)
>>>>>  PI>
>>>>>  PI>             ---
>>>>>  PI>             local[*] 8
>>>>>
>>>>>  PI>             Patrik Iselind <[email protected]>
>>>>> 于2020年5月9日周六下午8:34写道：
>>>>>  PI>
>>>>>  PI>                 Hi,
>>>>>  PI>
>>>>>  PI>                 First comes some background, then I have some
>>>>> questions.
>>>>>  PI>
>>>>>  PI>                 Background
>>>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>>>>> image. My Docker file looks like this:
>>>>>  PI>
>>>>>  PI>                 ```Dockerfile
>>>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>>>
>>>>>  PI>
>>>>>  PI>                 # Install Java and some tools
>>>>>  PI>                 RUN apt-get -y update &&\
>>>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>>>  PI>                         apt -y install vim python3-pip
>>>>>  PI>
>>>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>>>  PI>
>>>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 When I start a section like so
>>>>>  PI>
>>>>>  PI>                 ```Zeppelin paragraph
>>>>>  PI>                 %pyspark
>>>>>  PI>
>>>>>  PI>                 print(sc)
>>>>>  PI>                 print()
>>>>>  PI>                 print(dir(sc))
>>>>>  PI>                 print()
>>>>>  PI>                 print(sc.master)
>>>>>  PI>                 print()
>>>>>  PI>                 print(sc.defaultParallelism)
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 I get the following output
>>>>>  PI>
>>>>>  PI>                 ```output
>>>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>> '__repr__',
>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>> '_active_spark_context',
>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>> 'addFile',
>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>> 'cancelJobGroup',
>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>> 'pickleFile',
>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 
>>>>> 'startTime',
>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>>>  PI>                 ```
>>>>>  PI>
>>>>>  PI>                 This even though the "master" property in the
>>>>> interpretter is set to "local[*]". I'd like to use all cores on my 
>>>>> machine.
>>>>> To
>>>>>  PI>                 do that I have to explicitly create the
>>>>> "spark.master" property in the spark interpretter with the value
>>>>> "local[*]", then I
>>>>>  PI>                 get
>>>>>  PI>
>>>>>  PI>                 ```new output
>>>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>>> '__repr__',
>>>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>>>> '_active_spark_context',
>>>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>>> 'addFile',
>>>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>>>> 'cancelJobGroup',
>>>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD',
>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
>>>>> 'pickleFile',
>>>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 
>>>>> 'startTime',
>>>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>>>  PI>                 ```
>>>>>  PI>                 This is what I want.
>>>>>  PI>
>>>>>  PI>                 The Questions
>>>>>  PI>                   @ Why is the "master" property not used in the
>>>>> created SparkContext?
>>>>>  PI>                   @ How do I add the spark.master property to the
>>>>> docker image?
>>>>>  PI>
>>>>>  PI>                 Any hint or support you can provide would be
>>>>> greatly appreciated.
>>>>>  PI>
>>>>>  PI>                 Yours Sincerely,
>>>>>  PI>                 Patrik Iselind
>>>>>
>>>>>  PI>             --
>>>>>  PI>             Best Regards
>>>>>  PI>
>>>>>  PI>             Jeff Zhang
>>>>>
>>>>>  PI>     --
>>>>>  PI>     Best Regards
>>>>>  PI>
>>>>>  PI>     Jeff Zhang
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With best wishes,                    Alex Ott
>>>>> http://alexott.net/
>>>>> Twitter: alexott_en (English), alexott (Russian)
>>>>>
>>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>


-- 
Best Regards

Jeff Zhang

Re: Apache Spark master value question

Reply via email to