Re: Apache Spark master value question

Patrik Iselind Sat, 09 May 2020 11:31:17 -0700

Hello Jeff,

Thank you for looking into this for me.


Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
built 6 weeks ago), I still see the same issue. My image has the
digest "apache/zeppelin@sha256
:0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".

If it's not on the tip of master, could you guys please release a newer
0.9.0 image?

Best Regards,
Patrik Iselind


On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com> wrote:

> This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
> for me.
>
> print(sc.master)
> print(sc.defaultParallelism)
>
> ---
> local[*] 8
>
>
> Patrik Iselind <patrik....@gmail.com> 于2020年5月9日周六 下午8:34写道：
>
>> Hi,
>>
>> First comes some background, then I have some questions.
>>
>> *Background*
>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>> looks like this:
>>
>> ```Dockerfile
>> FROM apache/zeppelin:0.8.2
>>
>>
>> # Install Java and some tools
>> RUN apt-get -y update &&\
>>     DEBIAN_FRONTEND=noninteractive \
>>         apt -y install vim python3-pip
>>
>> RUN python3 -m pip install -U pyspark
>>
>> ENV PYSPARK_PYTHON python3
>> ENV PYSPARK_DRIVER_PYTHON python3
>> ```
>>
>> When I start a section like so
>>
>> ```Zeppelin paragraph
>> %pyspark
>>
>> print(sc)
>> print()
>> print(dir(sc))
>> print()
>> print(sc.master)
>> print()
>> print(sc.defaultParallelism)
>> ```
>>
>> I get the following output
>>
>> ```output
>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>> 'version', 'wholeTextFiles'] local 1
>> ```
>>
>> This even though the "master" property in the interpretter is set to
>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>> explicitly create the "spark.master" property in the spark
>> interpretter with the value "local[*]", then I get
>>
>> ```new output
>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>> 'version', 'wholeTextFiles'] local[*] 8
>> ```
>> This is what I want.
>>
>> *The Questions*
>>
>>    - Why is the "master" property not used in the created SparkContext?
>>    - How do I add the spark.master property to the docker image?
>>
>>
>> Any hint or support you can provide would be greatly appreciated.
>>
>> Yours Sincerely,
>> Patrik Iselind
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Apache Spark master value question

Reply via email to