I've seen somewhere in CDH documentation that they use MASTER, that's why I'm asking...
On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <patrik....@gmail.com> wrote: > Thanks a lot for creating the issue. It seems I am not allowed to. > > As I understand it, the environment variable is supposed to be > SPARK_MASTER. > > On Sun, May 17, 2020 at 11:56 AM Alex Ott <alex...@gmail.com> wrote: > >> Ok, I've created a JIRA for it: >> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch >> >> I'm not sure about environment variable name - it's simply MASTER, should >> it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop >> distributions to have it as MASTER? >> >> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <patrik....@gmail.com> >> wrote: >> >>> Hi Alex, >>> >>> Thanks a lot for helping out with this. >>> >>> You're correct, but it doesn't seem that it's the >>> interpreter-settings.json for Spark interpreter that is being used. It's >>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have >>> ```partial-json >>> "spark": { >>> "id": "spark", >>> "name": "spark", >>> "group": "spark", >>> "properties": { >>> "SPARK_HOME": { >>> "name": "SPARK_HOME", >>> "value": "", >>> "type": "string", >>> "description": "Location of spark distribution" >>> }, >>> "master": { >>> "name": "master", >>> "value": "local[*]", >>> "type": "string", >>> "description": "Spark master uri. local | yarn-client | >>> yarn-cluster | spark master address of standalone mode, ex) >>> spark://master_host:7077" >>> }, >>> ``` >>> That "master" should be "spark.master". >>> >>> By adding an explicit spark.master with the value "local[*]" I can use >>> all cores as expected. Without this and printing sc.master I get "local". >>> With the addition of the spark.master property set to "local[*]" and >>> printing sc.master I get "local[*]". My conclusion is >>> that conf/interpreter.json isn't in sync with the interpreter-settings.json >>> for Spark interpreter. >>> >>> Best regards, >>> Patrik Iselind >>> >>> >>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote: >>> >>>> Spark master is set to `local[*]` by default. Here is corresponding >>>> piece >>>> form interpreter-settings.json for Spark interpreter: >>>> >>>> "master": { >>>> "envName": "MASTER", >>>> "propertyName": "spark.master", >>>> "defaultValue": "local[*]", >>>> "description": "Spark master uri. local | yarn-client | >>>> yarn-cluster | spark master address of standalone mode, ex) >>>> spark://master_host:7077", >>>> "type": "string" >>>> }, >>>> >>>> >>>> Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote: >>>> PI> Hi Jeff, >>>> >>>> PI> I've tried the release from >>>> http://zeppelin.apache.org/download.html, both in a docker and without >>>> a docker. They both have the same issue as >>>> PI> previously described. >>>> >>>> PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps >>>> using some environment variable? >>>> >>>> PI> When is the next Zeppelin 0.9.0 docker image planned to be >>>> released? >>>> >>>> PI> Best Regards, >>>> PI> Patrik Iselind >>>> >>>> PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> >>>> wrote: >>>> >>>> PI> Hi Patric, >>>> PI> >>>> PI> Do you mind to try the 0.9.0-preview, it might be an issue of >>>> docker container. >>>> PI> >>>> PI> http://zeppelin.apache.org/download.html >>>> >>>> PI> Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道: >>>> PI> >>>> PI> Hello Jeff, >>>> PI> >>>> PI> Thank you for looking into this for me. >>>> PI> >>>> PI> Using the latest pushed docker image for 0.9.0 (image >>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image >>>> has >>>> PI> the digest "apache/zeppelin@sha256 >>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092". >>>> PI> >>>> PI> If it's not on the tip of master, could you guys please >>>> release a newer 0.9.0 image? >>>> PI> >>>> PI> Best Regards, >>>> PI> Patrik Iselind >>>> >>>> PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang < >>>> zjf...@gmail.com> wrote: >>>> PI> >>>> PI> This might be a bug of 0.8, I tried that in 0.9 >>>> (master branch), it works for me. >>>> PI> >>>> PI> print(sc.master) >>>> PI> print(sc.defaultParallelism) >>>> PI> >>>> PI> --- >>>> PI> local[*] 8 >>>> >>>> PI> Patrik Iselind <patrik....@gmail.com> >>>> 于2020年5月9日周六下午8:34写道: >>>> PI> >>>> PI> Hi, >>>> PI> >>>> PI> First comes some background, then I have some >>>> questions. >>>> PI> >>>> PI> Background >>>> PI> I'm trying out Zeppelin 0.8.2 based on the Docker >>>> image. My Docker file looks like this: >>>> PI> >>>> PI> ```Dockerfile >>>> PI> FROM apache/zeppelin:0.8.2 >>>> >>>> PI> >>>> PI> # Install Java and some tools >>>> PI> RUN apt-get -y update &&\ >>>> PI> DEBIAN_FRONTEND=noninteractive \ >>>> PI> apt -y install vim python3-pip >>>> PI> >>>> PI> RUN python3 -m pip install -U pyspark >>>> PI> >>>> PI> ENV PYSPARK_PYTHON python3 >>>> PI> ENV PYSPARK_DRIVER_PYTHON python3 >>>> PI> ``` >>>> PI> >>>> PI> When I start a section like so >>>> PI> >>>> PI> ```Zeppelin paragraph >>>> PI> %pyspark >>>> PI> >>>> PI> print(sc) >>>> PI> print() >>>> PI> print(dir(sc)) >>>> PI> print() >>>> PI> print(sc.master) >>>> PI> print() >>>> PI> print(sc.defaultParallelism) >>>> PI> ``` >>>> PI> >>>> PI> I get the following output >>>> PI> >>>> PI> ```output >>>> PI> <SparkContext master=local appName=Zeppelin> >>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>>> PI> '__hash__', '__init__', '__le__', '__lt__', >>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>>> '__repr__', >>>> PI> '__setattr__', '__sizeof__', '__str__', >>>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>>> '_active_spark_context', >>>> PI> '_batchSize', '_callsite', '_checkpointFile', >>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>>> PI> '_getJavaStorageLevel', '_initialize_context', >>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>>> PI> '_pickled_broadcast_vars', '_python_includes', >>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>>> 'addFile', >>>> PI> 'addPyFile', 'appName', 'applicationId', >>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>>> 'cancelJobGroup', >>>> PI> 'defaultMinPartitions', 'defaultParallelism', >>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >>>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', >>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', >>>> 'pickleFile', >>>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>>> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', >>>> 'startTime', >>>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>>> 'union', 'version', 'wholeTextFiles'] local 1 >>>> PI> ``` >>>> PI> >>>> PI> This even though the "master" property in the >>>> interpretter is set to "local[*]". I'd like to use all cores on my machine. >>>> To >>>> PI> do that I have to explicitly create the >>>> "spark.master" property in the spark interpretter with the value >>>> "local[*]", then I >>>> PI> get >>>> PI> >>>> PI> ```new output >>>> PI> <SparkContext master=local[*] appName=Zeppelin> >>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>>> PI> '__hash__', '__init__', '__le__', '__lt__', >>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>>> '__repr__', >>>> PI> '__setattr__', '__sizeof__', '__str__', >>>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>>> '_active_spark_context', >>>> PI> '_batchSize', '_callsite', '_checkpointFile', >>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>>> PI> '_getJavaStorageLevel', '_initialize_context', >>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>>> PI> '_pickled_broadcast_vars', '_python_includes', >>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>>> 'addFile', >>>> PI> 'addPyFile', 'appName', 'applicationId', >>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>>> 'cancelJobGroup', >>>> PI> 'defaultMinPartitions', 'defaultParallelism', >>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >>>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', >>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', >>>> 'pickleFile', >>>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>>> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', >>>> 'startTime', >>>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>>> 'union', 'version', 'wholeTextFiles'] local[*] 8 >>>> PI> ``` >>>> PI> This is what I want. >>>> PI> >>>> PI> The Questions >>>> PI> @ Why is the "master" property not used in the >>>> created SparkContext? >>>> PI> @ How do I add the spark.master property to the >>>> docker image? >>>> PI> >>>> PI> Any hint or support you can provide would be >>>> greatly appreciated. >>>> PI> >>>> PI> Yours Sincerely, >>>> PI> Patrik Iselind >>>> >>>> PI> -- >>>> PI> Best Regards >>>> PI> >>>> PI> Jeff Zhang >>>> >>>> PI> -- >>>> PI> Best Regards >>>> PI> >>>> PI> Jeff Zhang >>>> >>>> >>>> >>>> -- >>>> With best wishes, Alex Ott >>>> http://alexott.net/ >>>> Twitter: alexott_en (English), alexott (Russian) >>>> >>> >> >> -- >> With best wishes, Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> > -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)