Re: Apache Spark master value question

Jeff Zhang Sun, 10 May 2020 00:27:17 -0700

Hi Patric,

Do you mind to try the 0.9.0-preview, it might be an issue of docker
container.


http://zeppelin.apache.org/download.html



Patrik Iselind <patrik....@gmail.com> 于2020年5月10日周日 上午2:30写道：

> Hello Jeff,
>
> Thank you for looking into this for me.
>
> Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
> built 6 weeks ago), I still see the same issue. My image has the
> digest "apache/zeppelin@sha256
> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>
> If it's not on the tip of master, could you guys please release a newer
> 0.9.0 image?
>
> Best Regards,
> Patrik Iselind
>
>
> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <zjf...@gmail.com> wrote:
>
>> This might be a bug of 0.8, I tried that in 0.9 (master branch), it works
>> for me.
>>
>> print(sc.master)
>> print(sc.defaultParallelism)
>>
>> ---
>> local[*] 8
>>
>>
>> Patrik Iselind <patrik....@gmail.com> 于2020年5月9日周六 下午8:34写道：
>>
>>> Hi,
>>>
>>> First comes some background, then I have some questions.
>>>
>>> *Background*
>>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>>> looks like this:
>>>
>>> ```Dockerfile
>>> FROM apache/zeppelin:0.8.2
>>>
>>>
>>> # Install Java and some tools
>>> RUN apt-get -y update &&\
>>>     DEBIAN_FRONTEND=noninteractive \
>>>         apt -y install vim python3-pip
>>>
>>> RUN python3 -m pip install -U pyspark
>>>
>>> ENV PYSPARK_PYTHON python3
>>> ENV PYSPARK_DRIVER_PYTHON python3
>>> ```
>>>
>>> When I start a section like so
>>>
>>> ```Zeppelin paragraph
>>> %pyspark
>>>
>>> print(sc)
>>> print()
>>> print(dir(sc))
>>> print()
>>> print(sc.master)
>>> print()
>>> print(sc.defaultParallelism)
>>> ```
>>>
>>> I get the following output
>>>
>>> ```output
>>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>> 'version', 'wholeTextFiles'] local 1
>>> ```
>>>
>>> This even though the "master" property in the interpretter is set to
>>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>>> explicitly create the "spark.master" property in the spark
>>> interpretter with the value "local[*]", then I get
>>>
>>> ```new output
>>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>> 'version', 'wholeTextFiles'] local[*] 8
>>> ```
>>> This is what I want.
>>>
>>> *The Questions*
>>>
>>>    - Why is the "master" property not used in the created SparkContext?
>>>    - How do I add the spark.master property to the docker image?
>>>
>>>
>>> Any hint or support you can provide would be greatly appreciated.
>>>
>>> Yours Sincerely,
>>> Patrik Iselind
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

-- 
Best Regards

Jeff Zhang

Re: Apache Spark master value question

Reply via email to