Ok, I've created a JIRA for it: https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on patch
I'm not sure about environment variable name - it's simply MASTER, should it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop distributions to have it as MASTER? On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <patrik....@gmail.com> wrote: > Hi Alex, > > Thanks a lot for helping out with this. > > You're correct, but it doesn't seem that it's the > interpreter-settings.json for Spark interpreter that is being used. It's > conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have > ```partial-json > "spark": { > "id": "spark", > "name": "spark", > "group": "spark", > "properties": { > "SPARK_HOME": { > "name": "SPARK_HOME", > "value": "", > "type": "string", > "description": "Location of spark distribution" > }, > "master": { > "name": "master", > "value": "local[*]", > "type": "string", > "description": "Spark master uri. local | yarn-client | > yarn-cluster | spark master address of standalone mode, ex) > spark://master_host:7077" > }, > ``` > That "master" should be "spark.master". > > By adding an explicit spark.master with the value "local[*]" I can use all > cores as expected. Without this and printing sc.master I get "local". With > the addition of the spark.master property set to "local[*]" and printing > sc.master I get "local[*]". My conclusion is that conf/interpreter.json > isn't in sync with the interpreter-settings.json for Spark interpreter. > > Best regards, > Patrik Iselind > > > On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote: > >> Spark master is set to `local[*]` by default. Here is corresponding piece >> form interpreter-settings.json for Spark interpreter: >> >> "master": { >> "envName": "MASTER", >> "propertyName": "spark.master", >> "defaultValue": "local[*]", >> "description": "Spark master uri. local | yarn-client | >> yarn-cluster | spark master address of standalone mode, ex) >> spark://master_host:7077", >> "type": "string" >> }, >> >> >> Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote: >> PI> Hi Jeff, >> >> PI> I've tried the release from http://zeppelin.apache.org/download.html, >> both >> in a docker and without a docker. They both have the same issue as >> PI> previously described. >> >> PI> Can I somehow set spark.master to "local[*]" in zeppelin, perhaps >> using some environment variable? >> >> PI> When is the next Zeppelin 0.9.0 docker image planned to be released? >> >> PI> Best Regards, >> PI> Patrik Iselind >> >> PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> wrote: >> >> PI> Hi Patric, >> PI> >> PI> Do you mind to try the 0.9.0-preview, it might be an issue of >> docker container. >> PI> >> PI> http://zeppelin.apache.org/download.html >> >> PI> Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道: >> PI> >> PI> Hello Jeff, >> PI> >> PI> Thank you for looking into this for me. >> PI> >> PI> Using the latest pushed docker image for 0.9.0 (image >> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image >> has >> PI> the digest "apache/zeppelin@sha256 >> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092". >> PI> >> PI> If it's not on the tip of master, could you guys please >> release a newer 0.9.0 image? >> PI> >> PI> Best Regards, >> PI> Patrik Iselind >> >> PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com> >> wrote: >> PI> >> PI> This might be a bug of 0.8, I tried that in 0.9 (master >> branch), it works for me. >> PI> >> PI> print(sc.master) >> PI> print(sc.defaultParallelism) >> PI> >> PI> --- >> PI> local[*] 8 >> >> PI> Patrik Iselind <patrik....@gmail.com> >> 于2020年5月9日周六下午8:34写道: >> PI> >> PI> Hi, >> PI> >> PI> First comes some background, then I have some >> questions. >> PI> >> PI> Background >> PI> I'm trying out Zeppelin 0.8.2 based on the Docker >> image. My Docker file looks like this: >> PI> >> PI> ```Dockerfile >> PI> FROM apache/zeppelin:0.8.2 >> >> PI> >> PI> # Install Java and some tools >> PI> RUN apt-get -y update &&\ >> PI> DEBIAN_FRONTEND=noninteractive \ >> PI> apt -y install vim python3-pip >> PI> >> PI> RUN python3 -m pip install -U pyspark >> PI> >> PI> ENV PYSPARK_PYTHON python3 >> PI> ENV PYSPARK_DRIVER_PYTHON python3 >> PI> ``` >> PI> >> PI> When I start a section like so >> PI> >> PI> ```Zeppelin paragraph >> PI> %pyspark >> PI> >> PI> print(sc) >> PI> print() >> PI> print(dir(sc)) >> PI> print() >> PI> print(sc.master) >> PI> print() >> PI> print(sc.defaultParallelism) >> PI> ``` >> PI> >> PI> I get the following output >> PI> >> PI> ```output >> PI> <SparkContext master=local appName=Zeppelin> >> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >> PI> '__doc__', '__enter__', '__eq__', '__exit__', >> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >> PI> '__hash__', '__init__', '__le__', '__lt__', >> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >> '__repr__', >> PI> '__setattr__', '__sizeof__', '__str__', >> '__subclasshook__', '__weakref__', '_accumulatorServer', >> '_active_spark_context', >> PI> '_batchSize', '_callsite', '_checkpointFile', >> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >> PI> '_getJavaStorageLevel', '_initialize_context', >> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >> PI> '_pickled_broadcast_vars', '_python_includes', >> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >> 'addFile', >> PI> 'addPyFile', 'appName', 'applicationId', >> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >> 'cancelJobGroup', >> PI> 'defaultMinPartitions', 'defaultParallelism', >> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', >> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', >> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', >> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >> 'union', 'version', 'wholeTextFiles'] local 1 >> PI> ``` >> PI> >> PI> This even though the "master" property in the >> interpretter is set to "local[*]". I'd like to use all cores on my machine. >> To >> PI> do that I have to explicitly create the >> "spark.master" property in the spark interpretter with the value >> "local[*]", then I >> PI> get >> PI> >> PI> ```new output >> PI> <SparkContext master=local[*] appName=Zeppelin> >> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >> PI> '__doc__', '__enter__', '__eq__', '__exit__', >> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >> PI> '__hash__', '__init__', '__le__', '__lt__', >> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >> '__repr__', >> PI> '__setattr__', '__sizeof__', '__str__', >> '__subclasshook__', '__weakref__', '_accumulatorServer', >> '_active_spark_context', >> PI> '_batchSize', '_callsite', '_checkpointFile', >> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >> PI> '_getJavaStorageLevel', '_initialize_context', >> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >> PI> '_pickled_broadcast_vars', '_python_includes', >> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >> 'addFile', >> PI> 'addPyFile', 'appName', 'applicationId', >> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >> 'cancelJobGroup', >> PI> 'defaultMinPartitions', 'defaultParallelism', >> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', 'getLocalProperty', >> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', 'master', >> 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', 'pickleFile', >> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >> PI> 'setJobGroup', 'setLocalProperty', 'setLogLevel', >> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', 'startTime', >> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >> 'union', 'version', 'wholeTextFiles'] local[*] 8 >> PI> ``` >> PI> This is what I want. >> PI> >> PI> The Questions >> PI> @ Why is the "master" property not used in the >> created SparkContext? >> PI> @ How do I add the spark.master property to the >> docker image? >> PI> >> PI> Any hint or support you can provide would be greatly >> appreciated. >> PI> >> PI> Yours Sincerely, >> PI> Patrik Iselind >> >> PI> -- >> PI> Best Regards >> PI> >> PI> Jeff Zhang >> >> PI> -- >> PI> Best Regards >> PI> >> PI> Jeff Zhang >> >> >> >> -- >> With best wishes, Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> > -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)