Re: Apache Spark master value question

Jeff Zhang Sat, 09 May 2020 07:03:13 -0700

This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
for me.


print(sc.master)
print(sc.defaultParallelism)

---
local[*] 8


Patrik Iselind <patrik....@gmail.com> 于2020年5月9日周六 下午8:34写道：

> Hi,
>
> First comes some background, then I have some questions.
>
> *Background*
> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
> looks like this:
>
> ```Dockerfile
> FROM apache/zeppelin:0.8.2
>
>
> # Install Java and some tools
> RUN apt-get -y update &&\
>     DEBIAN_FRONTEND=noninteractive \
>         apt -y install vim python3-pip
>
> RUN python3 -m pip install -U pyspark
>
> ENV PYSPARK_PYTHON python3
> ENV PYSPARK_DRIVER_PYTHON python3
> ```
>
> When I start a section like so
>
> ```Zeppelin paragraph
> %pyspark
>
> print(sc)
> print()
> print(dir(sc))
> print()
> print(sc.master)
> print()
> print(sc.defaultParallelism)
> ```
>
> I get the following output
>
> ```output
> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
> 'version', 'wholeTextFiles'] local 1
> ```
>
> This even though the "master" property in the interpretter is set to
> "local[*]". I'd like to use all cores on my machine. To do that I have to
> explicitly create the "spark.master" property in the spark
> interpretter with the value "local[*]", then I get
>
> ```new output
> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
> 'version', 'wholeTextFiles'] local[*] 8
> ```
> This is what I want.
>
> *The Questions*
>
>    - Why is the "master" property not used in the created SparkContext?
>    - How do I add the spark.master property to the docker image?
>
>
> Any hint or support you can provide would be greatly appreciated.
>
> Yours Sincerely,
> Patrik Iselind
>


-- 
Best Regards

Jeff Zhang

Re: Apache Spark master value question

Reply via email to