RE: Time to start publishing Spark Docker Images?

2021-08-13 Thread Bode, Meikel, NMA-CFD
Hi all, I am Meikel Bode and only an interested reader of dev and user list. Anyway, I would appreciate to have official docker images available. Maybe one could get inspiration from the Jupyter docker stacks and provide an hierarchy of different images like this: https://jupyter-docker-stacks.

Re: Time to start publishing Spark Docker Images?

2021-08-13 Thread Mich Talebzadeh
Agreed. I have already built a few latest for Spark and PYSpark on 3.1.1 with Java 8 as I found out Java 11 does not work with Google BigQuery data warehouse. However, to hack the Dockerfile one finds out the hard way. For example how to add additional Python libraries like tensorflow etc. Loadin

Re: Time to start publishing Spark Docker Images?

2021-08-13 Thread Mich Talebzadeh
should read PySpark view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical

Re: Time to start publishing Spark Docker Images?

2021-08-13 Thread Holden Karau
So we actually do have a script that does the build already it's more a matter of publishing the results for easier use. Currently the script produces three images spark, spark-py, and spark-r. I can certainly see a solid reason to publish like with a jdk11 & jdk8 suffix as well if there is interes

Re: Time to start publishing Spark Docker Images?

2021-08-13 Thread Mich Talebzadeh
Hi, We can cater for multiple types (spark, spark-py and spark-r) and spark versions (assuming they are downloaded and available). The challenge is that these docker images built are snapshots. They cannot be amended later and if you change anything by going inside docker, as soon as you are logge

Creating docker images for Data Science

2021-08-13 Thread Mich Talebzadeh
Hi all, I am intending to create a docker image with Python 3.1.1 and Java 8 to include Python libraries for Data Science. Other versions with Java 11 will come later. The build process is automated with specific dockerfiles for different purposes. I am intending to install the following packag