This might be a bug of 0.8, I tried that in 0.9 (master branch), it works for me.
print(sc.master) print(sc.defaultParallelism) --- local[*] 8 Patrik Iselind <patrik....@gmail.com> 于2020年5月9日周六 下午8:34写道: > Hi, > > First comes some background, then I have some questions. > > *Background* > I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file > looks like this: > > ```Dockerfile > FROM apache/zeppelin:0.8.2 > > > # Install Java and some tools > RUN apt-get -y update &&\ > DEBIAN_FRONTEND=noninteractive \ > apt -y install vim python3-pip > > RUN python3 -m pip install -U pyspark > > ENV PYSPARK_PYTHON python3 > ENV PYSPARK_DRIVER_PYTHON python3 > ``` > > When I start a section like so > > ```Zeppelin paragraph > %pyspark > > print(sc) > print() > print(dir(sc)) > print() > print(sc.master) > print() > print(sc.defaultParallelism) > ``` > > I get the following output > > ```output > <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS', > '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', > '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', > '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', > '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', > '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize', > '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', > '_ensure_initialized', '_gateway', '_getJavaStorageLevel', > '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', > '_next_accum_id', '_pickled_broadcast_vars', '_python_includes', > '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', > 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles', > 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup', > 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', > 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile', > 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', > 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec', > 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', > 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel', > 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', > 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', > 'version', 'wholeTextFiles'] local 1 > ``` > > This even though the "master" property in the interpretter is set to > "local[*]". I'd like to use all cores on my machine. To do that I have to > explicitly create the "spark.master" property in the spark > interpretter with the value "local[*]", then I get > > ```new output > <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS', > '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', > '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', > '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', > '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', > '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize', > '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init', > '_ensure_initialized', '_gateway', '_getJavaStorageLevel', > '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock', > '_next_accum_id', '_pickled_broadcast_vars', '_python_includes', > '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator', > 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles', > 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup', > 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD', > 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile', > 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD', > 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec', > 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer', > 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel', > 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser', > 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union', > 'version', 'wholeTextFiles'] local[*] 8 > ``` > This is what I want. > > *The Questions* > > - Why is the "master" property not used in the created SparkContext? > - How do I add the spark.master property to the docker image? > > > Any hint or support you can provide would be greatly appreciated. > > Yours Sincerely, > Patrik Iselind > -- Best Regards Jeff Zhang