Re: Apache Spark master value question

Alex Ott Sat, 16 May 2020 07:34:25 -0700

Thank you for clarification Patrik

Can you create JIRA for tracking & fixing of this?


thanks

Patrik Iselind  at "Sat, 16 May 2020 15:45:07 +0200" wrote:
 PI> Hi Alex,

 PI> Thanks a lot for helping out with this.

 PI> You're correct, but it doesn't seem that it's the 
interpreter-settings.json for Spark interpreter that is being used. It's conf/
 PI> interpreter.json. In this file both 0.8.2 and 0.9.0 have 
 PI> ```partial-json
 PI>     "spark": {
 PI>       "id": "spark",
 PI>       "name": "spark",
 PI>       "group": "spark",
 PI>       "properties": {
 PI>         "SPARK_HOME": {
 PI>           "name": "SPARK_HOME",
 PI>           "value": "",
 PI>           "type": "string",
 PI>           "description": "Location of spark distribution"
 PI>         },
 PI>         "master": {
 PI>           "name": "master",
 PI>           "value": "local[*]",
 PI>           "type": "string",
 PI>           "description": "Spark master uri. local | yarn-client | 
yarn-cluster | spark master address of standalone mode, ex) spark://
 PI> master_host:7077"
 PI>         },
 PI> ```
 PI> That "master" should be "spark.master".

 PI> By adding an explicit spark.master with the value "local[*]" I can use all 
cores as expected. Without this and printing sc.master I get
 PI> "local". With the addition of the spark.master property set to "local[*]" 
and printing sc.master I get "local[*]". My conclusion is that conf/
 PI> interpreter.json isn't in sync with the interpreter-settings.json for 
Spark interpreter.

 PI> Best regards,
 PI> Patrik Iselind

 PI> On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote:

 PI>     Spark master is set to `local[*]` by default. Here is corresponding 
piece
 PI>     form interpreter-settings.json for Spark interpreter:
 PI>    
 PI>           "master": {
 PI>             "envName": "MASTER",
 PI>             "propertyName": "spark.master",
 PI>             "defaultValue": "local[*]",
 PI>             "description": "Spark master uri. local | yarn-client | 
yarn-cluster | spark master address of standalone mode, ex) spark://
 PI>     master_host:7077",
 PI>             "type": "string"
 PI>           },

 PI>     Patrik Iselind  at "Sun, 10 May 2020 20:31:08 +0200" wrote:
 PI>      PI> Hi Jeff,
 PI>    
 PI>      PI> I've tried the release from 
http://zeppelin.apache.org/download.html, both in a docker and without a 
docker. They both have the same
 PI>     issue as
 PI>      PI> previously described.
 PI>    
 PI>      PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps 
using some environment variable?
 PI>    
 PI>      PI> When is the next Zeppelin 0.9.0 docker image planned to be 
released?
 PI>    
 PI>      PI> Best Regards,
 PI>      PI> Patrik Iselind
 PI>    
 PI>      PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> 
wrote:
 PI>    
 PI>      PI>     Hi Patric,
 PI>      PI>   
 PI>      PI>     Do you mind to try the 0.9.0-preview, it might be an issue of 
docker container.
 PI>      PI>   
 PI>      PI>     http://zeppelin.apache.org/download.html
 PI>    
 PI>      PI>     Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道：
 PI>      PI>   
 PI>      PI>         Hello Jeff,
 PI>      PI>       
 PI>      PI>         Thank you for looking into this for me.
 PI>      PI>       
 PI>      PI>         Using the latest pushed docker image for 0.9.0 (image ID 
92890adfadfb, built 6 weeks ago), I still see the same issue. My
 PI>     image has
 PI>      PI>         the digest 
"apache/zeppelin@sha256:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
 PI>      PI>       
 PI>      PI>         If it's not on the tip of master, could you guys please 
release a newer 0.9.0 image?
 PI>      PI>       
 PI>      PI>         Best Regards,
 PI>      PI>         Patrik Iselind
 PI>    
 PI>      PI>         On Sat, May 9, 2020 at 4:03 PM Jeff Zhang 
<zjf...@gmail.com> wrote:
 PI>      PI>       
 PI>      PI>             This might be a bug of 0.8, I tried that in 0.9 
(master branch), it works for me.
 PI>      PI>           
 PI>      PI>             print(sc.master)
 PI>      PI>             print(sc.defaultParallelism)
 PI>      PI>           
 PI>      PI>             ---
 PI>      PI>             local[*] 8
 PI>    
 PI>      PI>             Patrik Iselind <patrik....@gmail.com> 
于2020年5月9日周六下午8:34写道：
 PI>      PI>           
 PI>      PI>                 Hi,
 PI>      PI>               
 PI>      PI>                 First comes some background, then I have some 
questions.
 PI>      PI>               
 PI>      PI>                 Background
 PI>      PI>                 I'm trying out Zeppelin 0.8.2 based on the Docker 
image. My Docker file looks like this:
 PI>      PI>               
 PI>      PI>                 ```Dockerfile
 PI>      PI>                 FROM apache/zeppelin:0.8.2                        
                                                                   
 PI>            
 PI>      PI>               
 PI>      PI>                 # Install Java and some tools
 PI>      PI>                 RUN apt-get -y update &&\
 PI>      PI>                     DEBIAN_FRONTEND=noninteractive \
 PI>      PI>                         apt -y install vim python3-pip
 PI>      PI>               
 PI>      PI>                 RUN python3 -m pip install -U pyspark
 PI>      PI>               
 PI>      PI>                 ENV PYSPARK_PYTHON python3
 PI>      PI>                 ENV PYSPARK_DRIVER_PYTHON python3
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 When I start a section like so
 PI>      PI>               
 PI>      PI>                 ```Zeppelin paragraph
 PI>      PI>                 %pyspark
 PI>      PI>               
 PI>      PI>                 print(sc)
 PI>      PI>                 print()
 PI>      PI>                 print(dir(sc))
 PI>      PI>                 print()
 PI>      PI>                 print(sc.master)
 PI>      PI>                 print()
 PI>      PI>                 print(sc.defaultParallelism)
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 I get the following output
 PI>      PI>               
 PI>      PI>                 ```output
 PI>      PI>                 <SparkContext master=local appName=Zeppelin> 
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
 PI>     '__dir__',
 PI>      PI>                 '__doc__', '__enter__', '__eq__', '__exit__', 
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>      PI>                 '__hash__', '__init__', '__le__', '__lt__', 
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
 PI>     '__repr__',
 PI>      PI>                 '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__', '_accumulatorServer',
 PI>     '_active_spark_context',
 PI>      PI>                 '_batchSize', '_callsite', '_checkpointFile', 
'_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
 PI>     '_gateway',
 PI>      PI>                 '_getJavaStorageLevel', '_initialize_context', 
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>      PI>                 '_pickled_broadcast_vars', '_python_includes', 
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
 PI>     'addFile',
 PI>      PI>                 'addPyFile', 'appName', 'applicationId', 
'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
 PI>     'cancelJobGroup',
 PI>      PI>                 'defaultMinPartitions', 'defaultParallelism', 
'dump_profiles', 'emptyRDD', 'environment', 'getConf',
 PI>     'getLocalProperty',
 PI>      PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 
'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
 PI>     'pickleFile',
 PI>      PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 
'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>      PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
 PI>     'startTime',
 PI>      PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 
'union', 'version', 'wholeTextFiles'] local 1
 PI>      PI>                 ```
 PI>      PI>               
 PI>      PI>                 This even though the "master" property in the 
interpretter is set to "local[*]". I'd like to use all cores on my
 PI>     machine. To
 PI>      PI>                 do that I have to explicitly create the 
"spark.master" property in the spark interpretter with the value "local[*]",
 PI>     then I
 PI>      PI>                 get
 PI>      PI>               
 PI>      PI>                 ```new output
 PI>      PI>                 <SparkContext master=local[*] appName=Zeppelin> 
['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__',
 PI>     '__dir__',
 PI>      PI>                 '__doc__', '__enter__', '__eq__', '__exit__', 
'__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__',
 PI>      PI>                 '__hash__', '__init__', '__le__', '__lt__', 
'__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
 PI>     '__repr__',
 PI>      PI>                 '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', '__weakref__', '_accumulatorServer',
 PI>     '_active_spark_context',
 PI>      PI>                 '_batchSize', '_callsite', '_checkpointFile', 
'_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized',
 PI>     '_gateway',
 PI>      PI>                 '_getJavaStorageLevel', '_initialize_context', 
'_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id',
 PI>      PI>                 '_pickled_broadcast_vars', '_python_includes', 
'_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
 PI>     'addFile',
 PI>      PI>                 'addPyFile', 'appName', 'applicationId', 
'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs',
 PI>     'cancelJobGroup',
 PI>      PI>                 'defaultMinPartitions', 'defaultParallelism', 
'dump_profiles', 'emptyRDD', 'environment', 'getConf',
 PI>     'getLocalProperty',
 PI>      PI>                 'getOrCreate', 'hadoopFile', 'hadoopRDD', 
'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize',
 PI>     'pickleFile',
 PI>      PI>                 'profiler_collector', 'pythonExec', 'pythonVer', 
'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir',
 PI>      PI>                 'setJobGroup', 'setLocalProperty', 'setLogLevel', 
'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
 PI>     'startTime',
 PI>      PI>                 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 
'union', 'version', 'wholeTextFiles'] local[*] 8
 PI>      PI>                 ```
 PI>      PI>                 This is what I want.
 PI>      PI>               
 PI>      PI>                 The Questions
 PI>      PI>                   @ Why is the "master" property not used in the 
created SparkContext?
 PI>      PI>                   @ How do I add the spark.master property to the 
docker image?
 PI>      PI>               
 PI>      PI>                 Any hint or support you can provide would be 
greatly appreciated.
 PI>      PI>               
 PI>      PI>                 Yours Sincerely,
 PI>      PI>                 Patrik Iselind
 PI>    
 PI>      PI>             --
 PI>      PI>             Best Regards
 PI>      PI>           
 PI>      PI>             Jeff Zhang
 PI>    
 PI>      PI>     --
 PI>      PI>     Best Regards
 PI>      PI>   
 PI>      PI>     Jeff Zhang

 PI>     --
 PI>     With best wishes,                    Alex Ott
 PI>     http://alexott.net/
 PI>     Twitter: alexott_en (English), alexott (Russian)



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: Apache Spark master value question

Reply via email to