Hi Jeff Ok, I'll update PR this evening to remove environment variable... Although, in some cases (for example for Docker), environment variable could be more handy - I need to look, maybe I'll rework that part so environment variable could be also used.
SparkLauncher is getting Spark master as master, then spark.master, then just use local[*]. But in different places in Spark interpreter I have seen that only spark.master is used directly - I think that this is cause the problem. On Mon, May 18, 2020 at 8:56 AM Jeff Zhang <zjf...@gmail.com> wrote: > The env name in interpreter.json and interpreter-setting.json is not used. > We should remove them. > > I still don't understand how master & spark.master would effect the > behavior. `master` is a legacy stuff that we introduced very long time ago, > we definitely should use spark.master instead. But actually internally we > do translate master to spark.master, so not sure why it would cause this > issue, maybe it is some bugs. > > > > Alex Ott <alex...@gmail.com> 于2020年5月17日周日 下午9:36写道: > >> I've seen somewhere in CDH documentation that they use MASTER, that's why >> I'm asking... >> >> On Sun, May 17, 2020 at 3:13 PM Patrik Iselind <patrik....@gmail.com> >> wrote: >> >>> Thanks a lot for creating the issue. It seems I am not allowed to. >>> >>> As I understand it, the environment variable is supposed to be >>> SPARK_MASTER. >>> >>> On Sun, May 17, 2020 at 11:56 AM Alex Ott <alex...@gmail.com> wrote: >>> >>>> Ok, I've created a JIRA for it: >>>> https://issues.apache.org/jira/browse/ZEPPELIN-4821 and working on >>>> patch >>>> >>>> I'm not sure about environment variable name - it's simply MASTER, >>>> should it be `SPARK_MASTER`, or it's a requirement of CDH and other Hadoop >>>> distributions to have it as MASTER? >>>> >>>> On Sat, May 16, 2020 at 3:45 PM Patrik Iselind <patrik....@gmail.com> >>>> wrote: >>>> >>>>> Hi Alex, >>>>> >>>>> Thanks a lot for helping out with this. >>>>> >>>>> You're correct, but it doesn't seem that it's the >>>>> interpreter-settings.json for Spark interpreter that is being used. It's >>>>> conf/interpreter.json. In this file both 0.8.2 and 0.9.0 have >>>>> ```partial-json >>>>> "spark": { >>>>> "id": "spark", >>>>> "name": "spark", >>>>> "group": "spark", >>>>> "properties": { >>>>> "SPARK_HOME": { >>>>> "name": "SPARK_HOME", >>>>> "value": "", >>>>> "type": "string", >>>>> "description": "Location of spark distribution" >>>>> }, >>>>> "master": { >>>>> "name": "master", >>>>> "value": "local[*]", >>>>> "type": "string", >>>>> "description": "Spark master uri. local | yarn-client | >>>>> yarn-cluster | spark master address of standalone mode, ex) >>>>> spark://master_host:7077" >>>>> }, >>>>> ``` >>>>> That "master" should be "spark.master". >>>>> >>>>> By adding an explicit spark.master with the value "local[*]" I can use >>>>> all cores as expected. Without this and printing sc.master I get "local". >>>>> With the addition of the spark.master property set to "local[*]" and >>>>> printing sc.master I get "local[*]". My conclusion is >>>>> that conf/interpreter.json isn't in sync with the >>>>> interpreter-settings.json >>>>> for Spark interpreter. >>>>> >>>>> Best regards, >>>>> Patrik Iselind >>>>> >>>>> >>>>> On Sat, May 16, 2020 at 11:22 AM Alex Ott <alex...@gmail.com> wrote: >>>>> >>>>>> Spark master is set to `local[*]` by default. Here is corresponding >>>>>> piece >>>>>> form interpreter-settings.json for Spark interpreter: >>>>>> >>>>>> "master": { >>>>>> "envName": "MASTER", >>>>>> "propertyName": "spark.master", >>>>>> "defaultValue": "local[*]", >>>>>> "description": "Spark master uri. local | yarn-client | >>>>>> yarn-cluster | spark master address of standalone mode, ex) >>>>>> spark://master_host:7077", >>>>>> "type": "string" >>>>>> }, >>>>>> >>>>>> >>>>>> Patrik Iselind at "Sun, 10 May 2020 20:31:08 +0200" wrote: >>>>>> PI> Hi Jeff, >>>>>> >>>>>> PI> I've tried the release from >>>>>> http://zeppelin.apache.org/download.html, both in a docker and >>>>>> without a docker. They both have the same issue as >>>>>> PI> previously described. >>>>>> >>>>>> PI> Can I somehow set spark.master to "local[*]" in zeppelin, >>>>>> perhaps using some environment variable? >>>>>> >>>>>> PI> When is the next Zeppelin 0.9.0 docker image planned to be >>>>>> released? >>>>>> >>>>>> PI> Best Regards, >>>>>> PI> Patrik Iselind >>>>>> >>>>>> PI> On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <zjf...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> PI> Hi Patric, >>>>>> PI> >>>>>> PI> Do you mind to try the 0.9.0-preview, it might be an issue >>>>>> of docker container. >>>>>> PI> >>>>>> PI> http://zeppelin.apache.org/download.html >>>>>> >>>>>> PI> Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日上午2:30写道: >>>>>> PI> >>>>>> PI> Hello Jeff, >>>>>> PI> >>>>>> PI> Thank you for looking into this for me. >>>>>> PI> >>>>>> PI> Using the latest pushed docker image for 0.9.0 (image >>>>>> ID 92890adfadfb, built 6 weeks ago), I still see the same issue. My image >>>>>> has >>>>>> PI> the digest "apache/zeppelin@sha256 >>>>>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092". >>>>>> PI> >>>>>> PI> If it's not on the tip of master, could you guys please >>>>>> release a newer 0.9.0 image? >>>>>> PI> >>>>>> PI> Best Regards, >>>>>> PI> Patrik Iselind >>>>>> >>>>>> PI> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang < >>>>>> zjf...@gmail.com> wrote: >>>>>> PI> >>>>>> PI> This might be a bug of 0.8, I tried that in 0.9 >>>>>> (master branch), it works for me. >>>>>> PI> >>>>>> PI> print(sc.master) >>>>>> PI> print(sc.defaultParallelism) >>>>>> PI> >>>>>> PI> --- >>>>>> PI> local[*] 8 >>>>>> >>>>>> PI> Patrik Iselind <patrik....@gmail.com> >>>>>> 于2020年5月9日周六下午8:34写道: >>>>>> PI> >>>>>> PI> Hi, >>>>>> PI> >>>>>> PI> First comes some background, then I have some >>>>>> questions. >>>>>> PI> >>>>>> PI> Background >>>>>> PI> I'm trying out Zeppelin 0.8.2 based on the >>>>>> Docker image. My Docker file looks like this: >>>>>> PI> >>>>>> PI> ```Dockerfile >>>>>> PI> FROM apache/zeppelin:0.8.2 >>>>>> >>>>>> >>>>>> PI> >>>>>> PI> # Install Java and some tools >>>>>> PI> RUN apt-get -y update &&\ >>>>>> PI> DEBIAN_FRONTEND=noninteractive \ >>>>>> PI> apt -y install vim python3-pip >>>>>> PI> >>>>>> PI> RUN python3 -m pip install -U pyspark >>>>>> PI> >>>>>> PI> ENV PYSPARK_PYTHON python3 >>>>>> PI> ENV PYSPARK_DRIVER_PYTHON python3 >>>>>> PI> ``` >>>>>> PI> >>>>>> PI> When I start a section like so >>>>>> PI> >>>>>> PI> ```Zeppelin paragraph >>>>>> PI> %pyspark >>>>>> PI> >>>>>> PI> print(sc) >>>>>> PI> print() >>>>>> PI> print(dir(sc)) >>>>>> PI> print() >>>>>> PI> print(sc.master) >>>>>> PI> print() >>>>>> PI> print(sc.defaultParallelism) >>>>>> PI> ``` >>>>>> PI> >>>>>> PI> I get the following output >>>>>> PI> >>>>>> PI> ```output >>>>>> PI> <SparkContext master=local appName=Zeppelin> >>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>>>>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>>>>> PI> '__hash__', '__init__', '__le__', '__lt__', >>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>>>>> '__repr__', >>>>>> PI> '__setattr__', '__sizeof__', '__str__', >>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>>>>> '_active_spark_context', >>>>>> PI> '_batchSize', '_callsite', '_checkpointFile', >>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>>>>> PI> '_getJavaStorageLevel', '_initialize_context', >>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>>>>> PI> '_pickled_broadcast_vars', '_python_includes', >>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>>>>> 'addFile', >>>>>> PI> 'addPyFile', 'appName', 'applicationId', >>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>>>>> 'cancelJobGroup', >>>>>> PI> 'defaultMinPartitions', 'defaultParallelism', >>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', >>>>>> 'getLocalProperty', >>>>>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', >>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', >>>>>> 'pickleFile', >>>>>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>>>>> PI> 'setJobGroup', 'setLocalProperty', >>>>>> 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', >>>>>> 'sparkUser', 'startTime', >>>>>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>>>>> 'union', 'version', 'wholeTextFiles'] local 1 >>>>>> PI> ``` >>>>>> PI> >>>>>> PI> This even though the "master" property in the >>>>>> interpretter is set to "local[*]". I'd like to use all cores on my >>>>>> machine. >>>>>> To >>>>>> PI> do that I have to explicitly create the >>>>>> "spark.master" property in the spark interpretter with the value >>>>>> "local[*]", then I >>>>>> PI> get >>>>>> PI> >>>>>> PI> ```new output >>>>>> PI> <SparkContext master=local[*] appName=Zeppelin> >>>>>> ['PACKAGE_EXTENSIONS', '__class__', '__delattr__', '__dict__', '__dir__', >>>>>> PI> '__doc__', '__enter__', '__eq__', '__exit__', >>>>>> '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', >>>>>> PI> '__hash__', '__init__', '__le__', '__lt__', >>>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', >>>>>> '__repr__', >>>>>> PI> '__setattr__', '__sizeof__', '__str__', >>>>>> '__subclasshook__', '__weakref__', '_accumulatorServer', >>>>>> '_active_spark_context', >>>>>> PI> '_batchSize', '_callsite', '_checkpointFile', >>>>>> '_conf', '_dictToJavaMap', '_do_init', '_ensure_initialized', '_gateway', >>>>>> PI> '_getJavaStorageLevel', '_initialize_context', >>>>>> '_javaAccumulator', '_jsc', '_jvm', '_lock', '_next_accum_id', >>>>>> PI> '_pickled_broadcast_vars', '_python_includes', >>>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', >>>>>> 'addFile', >>>>>> PI> 'addPyFile', 'appName', 'applicationId', >>>>>> 'binaryFiles', 'binaryRecords', 'broadcast', 'cancelAllJobs', >>>>>> 'cancelJobGroup', >>>>>> PI> 'defaultMinPartitions', 'defaultParallelism', >>>>>> 'dump_profiles', 'emptyRDD', 'environment', 'getConf', >>>>>> 'getLocalProperty', >>>>>> PI> 'getOrCreate', 'hadoopFile', 'hadoopRDD', >>>>>> 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', 'parallelize', >>>>>> 'pickleFile', >>>>>> PI> 'profiler_collector', 'pythonExec', 'pythonVer', >>>>>> 'range', 'runJob', 'sequenceFile', 'serializer', 'setCheckpointDir', >>>>>> PI> 'setJobGroup', 'setLocalProperty', >>>>>> 'setLogLevel', 'setSystemProperty', 'show_profiles', 'sparkHome', >>>>>> 'sparkUser', 'startTime', >>>>>> PI> 'statusTracker', 'stop', 'textFile', 'uiWebUrl', >>>>>> 'union', 'version', 'wholeTextFiles'] local[*] 8 >>>>>> PI> ``` >>>>>> PI> This is what I want. >>>>>> PI> >>>>>> PI> The Questions >>>>>> PI> @ Why is the "master" property not used in the >>>>>> created SparkContext? >>>>>> PI> @ How do I add the spark.master property to >>>>>> the docker image? >>>>>> PI> >>>>>> PI> Any hint or support you can provide would be >>>>>> greatly appreciated. >>>>>> PI> >>>>>> PI> Yours Sincerely, >>>>>> PI> Patrik Iselind >>>>>> >>>>>> PI> -- >>>>>> PI> Best Regards >>>>>> PI> >>>>>> PI> Jeff Zhang >>>>>> >>>>>> PI> -- >>>>>> PI> Best Regards >>>>>> PI> >>>>>> PI> Jeff Zhang >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> With best wishes, Alex Ott >>>>>> http://alexott.net/ >>>>>> Twitter: alexott_en (English), alexott (Russian) >>>>>> >>>>> >>>> >>>> -- >>>> With best wishes, Alex Ott >>>> http://alexott.net/ >>>> Twitter: alexott_en (English), alexott (Russian) >>>> >>> >> >> -- >> With best wishes, Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> > > > -- > Best Regards > > Jeff Zhang > -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)