Thanks a lot for creating the issue. It seems I am not allowed to. As I understand it, the environment variable is supposed to be SPARK_MASTER.
On Sun, May 17, 2020 at 11:56 AM Alex Ott <alex...@gmail.com> wrote: > Ok, I've created a JIRA for it: > https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch > > I'm not sure about environment variable name - it's simply MASTER, should > it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop > distributions to have it as MASTER? > > On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <patrik....@gmail.com> > wrote: > >> Hi Alex, >> >> Thanks a lot for helping out with this. >> >> You're correct, but it doesn't seem that it's the >> interpreter-settings.json for Spark interpreter that is being used. It's >> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have >> ```partial-json >> "spark": { >> "id": "spark", >> "name": "spark", >> "group": "spark", >> "properties": { >> "SPARK_HOME": { >> "name": "SPARK_HOME", >> "value": "", >> "type": "string", >> "description": "Location of spark distribution" >> }, >> "master": { >> "name": "master", >> "value": "local[*]", >> "type": "string", >> "description": "Spark master uri. local | yarn-client | >> yarn-cluster | spark master address of standalone mode, ex) >> spark://master_host:7077" >> }, >> ``` >> That "master" should be "spark.master". >> >> By adding an explicit spark.master with the value "local[*]" I can use >> all cores as expected. Without this and printing sc.master I get "local". >> With the addition of the spark.master property set to "local[*]" and >> printing sc.master I get "local[*]". My conclusion is >> that conf/interpreter.json isn't in sync with the interpreter-settings.json >> for Spark interpreter. >> >> Best regards, >> Patrik Iselind >> >> >> On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote: >> >>> Spark master is set to `local[*]` by default. Here is corresponding piece >>> form interpreter-settings.json for Spark interpreter: >>> >>> "master": { >>> "envName": "MASTER", >>> "propertyName": "spark.master", >>> "defaultValue": "local[*]", >>> "description": "Spark master uri. local | yarn-client | >>> yarn-cluster | spark master address of standalone mode, ex) >>> spark://master_host:7077", >>> "type": "string" >>> }, >>> >>> >>> Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote: >>> PI> Hi Jeff, >>> >>> PI> I've tried the release from >>> http://zeppelin.apache.org/download.html, both in a docker and without >>> a docker. They both have the same issue as >>> PI> previously described. >>> >>> PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps >>> using some environment variable? >>> >>> PI> When is the next Zeppelin 0.9.0 docker image planned to be released? >>> >>> PI> Best Regards, >>> PI> Patrik Iselind >>> >>> PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> >>> wrote: >>> >>> PI> Hi Patric, >>> PI> >>> PI> Do you mind to try the 0.9.0-preview, it might be an issue of >>> docker container. >>> PI> >>> PI> http://zeppelin.apache.org/download.html >>> >>> PI> Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道: >>> PI> >>> PI> Hello Jeff, >>> PI> >>> PI> Thank you for looking into this for me. >>> PI> >>> PI> Using the latest pushed docker image for 0.9.0 (image >>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image >>> has >>> PI> the digest "apache/zeppelin@sha256 >>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092". >>> PI> >>> PI> If it's not on the tip of master, could you guys please >>> release a newer 0.9.0 image? >>> PI> >>> PI> Best Regards, >>> PI> Patrik Iselind >>> >>> PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com> >>> wrote: >>> PI> >>> PI> This might be a bug of 0.8, I tried that in 0.9 (master >>> branch), it works for me. >>> PI> >>> PI> print(sc.master) >>> PI> print(sc.defaultParallelism) >>> PI> >>> PI> --- >>> PI> local[*] 8 >>> >>> PI> Patrik Iselind <patrik....@gmail.com> >>> 于2020年5月9日周六下午8:34写道: >>> PI> >>> PI> Hi, >>> PI> >>> PI> First comes some background, then I have some >>> questions. >>> PI> >>> PI> Background >>> PI> I'm trying out Zeppelin 0.8.2 based on the Docker >>> image. My Docker file looks like this: >>> PI> >>> PI> ```Dockerfile >>> PI> FROM apache/zeppelin:0.8.2 >>> >>> PI> >>> PI> # Install Java and some tools >>> PI> RUN apt-get -y update &&\ >>> PI> DEBIAN_FRONTEND=noninteractive \ >>> PI> apt -y install vim python3-pip >>> PI> >>> PI> RUN python3 -m pip install -U pyspark >>> PI> >>> PI> ENV PYSPARK_PYTHON python3 >>> PI> ENV PYSPARK_DRIVER_PYTHON python3 >>> PI> ``` >>> PI> >>> PI> When I start a section like so >>> PI> >>> PI> ```Zeppelin paragraph >>> PI> %pyspark >>> PI> >>> PI> print(sc) >>> PI> print() >>> PI> print(dir(sc)) >>> PI> print() >>> PI> print(sc.master) >>> PI> print() >>> PI> print(sc.defaultParallelism) >>> PI> ``` >>> PI> >>> PI> I get the following output >>> PI> >>> PI> ```output >>> PI> <SparkContext master=local appName=Zeppelin> >>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>> PI> '__hash__', '__init__', '__le__', '__lt__', >>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>> '__repr__', >>> PI> '__setattr__', '__sizeof__', '__str__', >>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>> '_active_spark_context', >>> PI> '_batchSize', '_callsite', '_checkpointFile', >>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>> PI> '_getJavaStorageLevel', '_initialize_context', >>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>> PI> '_pickled_broadcast_vars', '_python_includes', >>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>> 'addFile', >>> PI> 'addPyFile', 'appName', 'applicationId', >>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>> 'cancelJobGroup', >>> PI> 'defaultMinPartitions', 'defaultParallelism', >>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', >>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', >>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', >>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>> 'union', 'version', 'wholeTextFiles'] local 1 >>> PI> ``` >>> PI> >>> PI> This even though the "master" property in the >>> interpretter is set to "local[*]". I'd like to use all cores on my machine. >>> To >>> PI> do that I have to explicitly create the >>> "spark.master" property in the spark interpretter with the value >>> "local[*]", then I >>> PI> get >>> PI> >>> PI> ```new output >>> PI> <SparkContext master=local[*] appName=Zeppelin> >>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>> PI> '__hash__', '__init__', '__le__', '__lt__', >>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>> '__repr__', >>> PI> '__setattr__', '__sizeof__', '__str__', >>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>> '_active_spark_context', >>> PI> '_batchSize', '_callsite', '_checkpointFile', >>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>> PI> '_getJavaStorageLevel', '_initialize_context', >>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>> PI> '_pickled_broadcast_vars', '_python_includes', >>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>> 'addFile', >>> PI> 'addPyFile', 'appName', 'applicationId', >>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>> 'cancelJobGroup', >>> PI> 'defaultMinPartitions', 'defaultParallelism', >>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', >>> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', >>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', >>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>> 'union', 'version', 'wholeTextFiles'] local[*] 8 >>> PI> ``` >>> PI> This is what I want. >>> PI> >>> PI> The Questions >>> PI> @ Why is the "master" property not used in the >>> created SparkContext? >>> PI> @ How do I add the spark.master property to the >>> docker image? >>> PI> >>> PI> Any hint or support you can provide would be >>> greatly appreciated. >>> PI> >>> PI> Yours Sincerely, >>> PI> Patrik Iselind >>> >>> PI> -- >>> PI> Best Regards >>> PI> >>> PI> Jeff Zhang >>> >>> PI> -- >>> PI> Best Regards >>> PI> >>> PI> Jeff Zhang >>> >>> >>> >>> -- >>> With best wishes, Alex Ott >>> http://alexott.net/ >>> Twitter: alexott_en (English), alexott (Russian) >>> >> > > -- > With best wishes, Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >