Re: Apache Spark master value question

Patrik Iselind Sun, 10 May 2020 11:32:17 -0700

Hi Jeff,

I've tried the release from http://zeppelin.apache.org/download.html, both
in a docker and without a docker. They both have the same issue as
previously described.


Can I somehow set spark.master to "local[*]" in zeppelin, perhaps using
some environment variable?

When is the next Zeppelin 0.9.0 docker image planned to be released?

Best Regards,
Patrik Iselind


On Sun, May 10, 2020 at 9:26 AM Jeff Zhang <[email protected]> wrote:

> Hi Patric,
>
> Do you mind to try the 0.9.0-preview, it might be an issue of docker
> container.
>
> http://zeppelin.apache.org/download.html
>
>
>
> Patrik Iselind <[email protected]> 于2020年5月10日周日 上午2:30写道：
>
>> Hello Jeff,
>>
>> Thank you for looking into this for me.
>>
>> Using the latest pushed docker image for 0.9.0 (image ID 92890adfadfb,
>> built 6 weeks ago), I still see the same issue. My image has the
>> digest "apache/zeppelin@sha256
>> :0691909f6884319d366f5d3a5add8802738d6240a83b2e53e980caeb6c658092".
>>
>> If it's not on the tip of master, could you guys please release a newer
>> 0.9.0 image?
>>
>> Best Regards,
>> Patrik Iselind
>>
>>
>> On Sat, May 9, 2020 at 4:03 PM Jeff Zhang <[email protected]> wrote:
>>
>>> This might be a bug of 0.8, I tried that in 0.9 (master branch), it
>>> works for me.
>>>
>>> print(sc.master)
>>> print(sc.defaultParallelism)
>>>
>>> ---
>>> local[*] 8
>>>
>>>
>>> Patrik Iselind <[email protected]> 于2020年5月9日周六 下午8:34写道：
>>>
>>>> Hi,
>>>>
>>>> First comes some background, then I have some questions.
>>>>
>>>> *Background*
>>>> I'm trying out Zeppelin 0.8.2 based on the Docker image. My Docker file
>>>> looks like this:
>>>>
>>>> ```Dockerfile
>>>> FROM apache/zeppelin:0.8.2
>>>>
>>>>
>>>> # Install Java and some tools
>>>> RUN apt-get -y update &&\
>>>>     DEBIAN_FRONTEND=noninteractive \
>>>>         apt -y install vim python3-pip
>>>>
>>>> RUN python3 -m pip install -U pyspark
>>>>
>>>> ENV PYSPARK_PYTHON python3
>>>> ENV PYSPARK_DRIVER_PYTHON python3
>>>> ```
>>>>
>>>> When I start a section like so
>>>>
>>>> ```Zeppelin paragraph
>>>> %pyspark
>>>>
>>>> print(sc)
>>>> print()
>>>> print(dir(sc))
>>>> print()
>>>> print(sc.master)
>>>> print()
>>>> print(sc.defaultParallelism)
>>>> ```
>>>>
>>>> I get the following output
>>>>
>>>> ```output
>>>> <SparkContext master=local appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>>> 'version', 'wholeTextFiles'] local 1
>>>> ```
>>>>
>>>> This even though the "master" property in the interpretter is set to
>>>> "local[*]". I'd like to use all cores on my machine. To do that I have to
>>>> explicitly create the "spark.master" property in the spark
>>>> interpretter with the value "local[*]", then I get
>>>>
>>>> ```new output
>>>> <SparkContext master=local[*] appName=Zeppelin> ['PACKAGE_EXTENSIONS',
>>>> '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__',
>>>> '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__',
>>>> '__getnewargs__', '__gt__', '__hash__', '__init__', '__le__', '__lt__',
>>>> '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
>>>> '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
>>>> '__weakref__', '_accumulatorServer', '_active_spark_context', '_batchSize',
>>>> '_callsite', '_checkpointFile', '_conf', '_dictToJavaMap', '_do_init',
>>>> '_ensure_initialized', '_gateway', '_getJavaStorageLevel',
>>>> '_initialize_context', '_javaAccumulator', '_jsc', '_jvm', '_lock',
>>>> '_next_accum_id', '_pickled_broadcast_vars', '_python_includes',
>>>> '_repr_html_', '_temp_dir', '_unbatched_serializer', 'accumulator',
>>>> 'addFile', 'addPyFile', 'appName', 'applicationId', 'binaryFiles',
>>>> 'binaryRecords', 'broadcast', 'cancelAllJobs', 'cancelJobGroup',
>>>> 'defaultMinPartitions', 'defaultParallelism', 'dump_profiles', 'emptyRDD',
>>>> 'environment', 'getConf', 'getLocalProperty', 'getOrCreate', 'hadoopFile',
>>>> 'hadoopRDD', 'master', 'newAPIHadoopFile', 'newAPIHadoopRDD',
>>>> 'parallelize', 'pickleFile', 'profiler_collector', 'pythonExec',
>>>> 'pythonVer', 'range', 'runJob', 'sequenceFile', 'serializer',
>>>> 'setCheckpointDir', 'setJobGroup', 'setLocalProperty', 'setLogLevel',
>>>> 'setSystemProperty', 'show_profiles', 'sparkHome', 'sparkUser',
>>>> 'startTime', 'statusTracker', 'stop', 'textFile', 'uiWebUrl', 'union',
>>>> 'version', 'wholeTextFiles'] local[*] 8
>>>> ```
>>>> This is what I want.
>>>>
>>>> *The Questions*
>>>>
>>>>    - Why is the "master" property not used in the created SparkContext?
>>>>    - How do I add the spark.master property to the docker image?
>>>>
>>>>
>>>> Any hint or support you can provide would be greatly appreciated.
>>>>
>>>> Yours Sincerely,
>>>> Patrik Iselind
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Apache Spark master value question

Reply via email to