Re: Apache Spark master value question

Patrik Iselind Sun, 17 May 2020 06:13:25 -0700

Thanks a lot for creating the issue. It seems I am not allowed to.

As I understand it, the environment variable is supposed to be SPARK_MASTER.


On Sun, May 17, 2020 at 11:56 AM Alex Ott <alex...@gmail.com> wrote:

> Ok, I've created a JIRA for it:
> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
>
> I'm not sure about environment variable name - it's simply MASTER, should
> it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop
> distributions to have it as MASTER?
>
> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <patrik....@gmail.com>
> wrote:
>
>> Hi Alex,
>>
>> Thanks a lot for helping out with this.
>>
>> You're correct, but it doesn't seem that it's the
>> interpreter-settings.json for Spark interpreter that is being used. It's
>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have
>> ```partial-json
>>     "spark": {
>>       "id": "spark",
>>       "name": "spark",
>>       "group": "spark",
>>       "properties": {
>>         "SPARK_HOME": {
>>           "name": "SPARK_HOME",
>>           "value": "",
>>           "type": "string",
>>           "description": "Location of spark distribution"
>>         },
>>         "master": {
>>           "name": "master",
>>           "value": "local[*]",
>>           "type": "string",
>>           "description": "Spark master uri. local | yarn-client |
>> yarn-cluster | spark master address of standalone mode, ex)
>> spark://master_host:7077"
>>         },
>> ```
>> That "master" should be "spark.master".
>>
>> By adding an explicit spark.master with the value "local[*]" I can use
>> all cores as expected. Without this and printing sc.master I get "local".
>> With the addition of the spark.master property set to "local[*]" and
>> printing sc.master I get "local[*]". My conclusion is
>> that conf/interpreter.json isn't in sync with the interpreter-settings.json
>> for Spark interpreter.
>>
>> Best regards,
>> Patrik Iselind
>>
>>
>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote:
>>
>>> Spark master is set to `local[*]` by default. Here is corresponding piece
>>> form interpreter-settings.json for Spark interpreter:
>>>
>>>       "master": {
>>>         "envName": "MASTER",
>>>         "propertyName": "spark.master",
>>>         "defaultValue": "local[*]",
>>>         "description": "Spark master uri. local | yarn-client |
>>> yarn-cluster | spark master address of standalone mode, ex)
>>> spark://master_host:7077",
>>>         "type": "string"
>>>       },
>>>
>>>
>>> Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
>>>  PI> Hi Jeff,
>>>
>>>  PI> I've tried the release from
>>> http://zeppelin.apache.org/download.html, both in a docker and without
>>> a docker. They both have the same issue as
>>>  PI> previously described.
>>>
>>>  PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps
>>> using some environment variable?
>>>
>>>  PI> When is the next Zeppelin 0.9.0 docker image planned to be released?
>>>
>>>  PI> Best Regards,
>>>  PI> Patrik Iselind
>>>
>>>  PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com>
>>> wrote:
>>>
>>>  PI>     Hi Patric,
>>>  PI>
>>>  PI>     Do you mind to try the 0.9.0-preview, it might be an issue of
>>> docker container.
>>>  PI>
>>>  PI>     http://zeppelin.apache.org/download.html
>>>
>>>  PI>     Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道：
>>>  PI>
>>>  PI>         Hello Jeff,
>>>  PI>
>>>  PI>         Thank you for looking into this for me.
>>>  PI>
>>>  PI>         Using the latest pushed docker image for 0.9.0 (image
>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image
>>> has
>>>  PI>         the digest "apache/zeppelin@sha256
>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>>  PI>
>>>  PI>         If it's not on the tip of master, could you guys please
>>> release a newer 0.9.0 image?
>>>  PI>
>>>  PI>         Best Regards,
>>>  PI>         Patrik Iselind
>>>
>>>  PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com>
>>> wrote:
>>>  PI>
>>>  PI>             This might be a bug of 0.8, I tried that in 0.9 (master
>>> branch), it works for me.
>>>  PI>
>>>  PI>             print(sc.master)
>>>  PI>             print(sc.defaultParallelism)
>>>  PI>
>>>  PI>             ---
>>>  PI>             local[*] 8
>>>
>>>  PI>             Patrik Iselind <patrik....@gmail.com>
>>> 于2020年5月9日周六下午8:34写道：
>>>  PI>
>>>  PI>                 Hi,
>>>  PI>
>>>  PI>                 First comes some background, then I have some
>>> questions.
>>>  PI>
>>>  PI>                 Background
>>>  PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker
>>> image. My Docker file looks like this:
>>>  PI>
>>>  PI>                 ```Dockerfile
>>>  PI>                 FROM apache/zeppelin:0.8.2
>>>
>>>  PI>
>>>  PI>                 # Install Java and some tools
>>>  PI>                 RUN apt-get -y update &&\
>>>  PI>                     DEBIAN_FRONTEND=noninteractive \
>>>  PI>                         apt -y install vim python3-pip
>>>  PI>
>>>  PI>                 RUN python3 -m pip install -U pyspark
>>>  PI>
>>>  PI>                 ENV PYSPARK_PYTHON python3
>>>  PI>                 ENV PYSPARK_DRIVER_PYTHON python3
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 When I start a section like so
>>>  PI>
>>>  PI>                 ```Zeppelin paragraph
>>>  PI>                 %pyspark
>>>  PI>
>>>  PI>                 print(sc)
>>>  PI>                 print()
>>>  PI>                 print(dir(sc))
>>>  PI>                 print()
>>>  PI>                 print(sc.master)
>>>  PI>                 print()
>>>  PI>                 print(sc.defaultParallelism)
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 I get the following output
>>>  PI>
>>>  PI>                 ```output
>>>  PI>                 <SparkContext master=local appName=Zeppelin>
>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__',
>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>> '_active_spark_context',
>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile',
>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>> 'cancelJobGroup',
>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>> 'union', 'version', 'wholeTextFiles'] local 1
>>>  PI>                 ```
>>>  PI>
>>>  PI>                 This even though the "master" property in the
>>> interpretter is set to "local[*]". I'd like to use all cores on my machine.
>>> To
>>>  PI>                 do that I have to explicitly create the
>>> "spark.master" property in the spark interpretter with the value
>>> "local[*]", then I
>>>  PI>                 get
>>>  PI>
>>>  PI>                 ```new output
>>>  PI>                 <SparkContext master=local[*] appName=Zeppelin>
>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__',
>>>  PI>                 '__doc__', '__enter__', '__eq__', '__exit__',
>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
>>>  PI>                 '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__',
>>>  PI>                 '__setattr__', '__sizeof__', '__str__',
>>> '__subclasshook__', '__weakref__', '_accumulatorServer',
>>> '_active_spark_context',
>>>  PI>                 '_batchSize', '_callsite', '_checkpointFile',
>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway',
>>>  PI>                 '_getJavaStorageLevel', '_initialize_context',
>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
>>>  PI>                 '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile',
>>>  PI>                 'addPyFile', 'appName', 'applicationId',
>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
>>> 'cancelJobGroup',
>>>  PI>                 'defaultMinPartitions', 'defaultParallelism',
>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty',
>>>  PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master',
>>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile',
>>>  PI>                 'profiler_collector', 'pythonExec', 'pythonVer',
>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
>>>  PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime',
>>>  PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl',
>>> 'union', 'version', 'wholeTextFiles'] local[*] 8
>>>  PI>                 ```
>>>  PI>                 This is what I want.
>>>  PI>
>>>  PI>                 The Questions
>>>  PI>                   @ Why is the "master" property not used in the
>>> created SparkContext?
>>>  PI>                   @ How do I add the spark.master property to the
>>> docker image?
>>>  PI>
>>>  PI>                 Any hint or support you can provide would be
>>> greatly appreciated.
>>>  PI>
>>>  PI>                 Yours Sincerely,
>>>  PI>                 Patrik Iselind
>>>
>>>  PI>             --
>>>  PI>             Best Regards
>>>  PI>
>>>  PI>             Jeff Zhang
>>>
>>>  PI>     --
>>>  PI>     Best Regards
>>>  PI>
>>>  PI>     Jeff Zhang
>>>
>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)
>>>
>>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>

Re: Apache Spark master value question

Reply via email to