So we actually do have a script that does the build already it's more a
matter of publishing the results for easier use. Currently the script
produces three images spark, spark-py, and spark-r. I can certainly see a
solid reason to publish like with a jdk11 & jdk8 suffix as well if there is
interest in the community. If we want to have a say spark-py-pandas for a
Spark container image with everything necessary for the Koalas stuff to
work then I think that could be a great PR from someone to add :)

On Fri, Aug 13, 2021 at 1:00 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> should read PySpark
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Agreed.
>>
>> I have already built a few latest for Spark and PYSpark on 3.1.1 with
>> Java 8 as I found out Java 11 does not work with Google BigQuery data
>> warehouse. However, to hack the Dockerfile one finds out the hard way.
>>
>> For example how to add additional Python libraries like tensorflow etc.
>> Loading these libraries through Kubernetes is not practical as unzipping
>> and installing it through --py-files etc will take considerable time so
>> they need to be added to the dockerfile at the built time in directory for
>> Python under Kubernetes
>>
>> /opt/spark/kubernetes/dockerfiles/spark/bindings/python
>>
>> RUN pip install pyyaml numpy cx_Oracle tensorflow ....
>>
>> Also you will need curl to test the ports from inside the docker
>>
>> RUN apt-get update && apt-get install -y curl
>> RUN ["apt-get","install","-y","vim"]
>>
>> As I said I am happy to build these specific dockerfiles plus the
>> complete documentation for it. I have already built one for Google (GCP).
>> The difference between Spark and PySpark version is that in Spark/scala a
>> fat jar file will contain all needed. That is not the case with Python I am
>> afraid.
>>
>> HTH
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD <
>> meikel.b...@bertelsmann.de> wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I am Meikel Bode and only an interested reader of dev and user list.
>>> Anyway, I would appreciate to have official docker images available.
>>>
>>> Maybe one could get inspiration from the Jupyter docker stacks and
>>> provide an hierarchy of different images like this:
>>>
>>>
>>>
>>>
>>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships
>>>
>>>
>>>
>>> Having a core image only supporting Java, an extended supporting Python
>>> and/or R etc.
>>>
>>>
>>>
>>> Looking forward to the discussion.
>>>
>>>
>>>
>>> Best,
>>>
>>> Meikel
>>>
>>>
>>>
>>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>>> *Sent:* Freitag, 13. August 2021 08:45
>>> *Cc:* dev <dev@spark.apache.org>
>>> *Subject:* Re: Time to start publishing Spark Docker Images?
>>>
>>>
>>>
>>> I concur this is a good idea and certainly worth exploring.
>>>
>>>
>>>
>>> In practice, preparing docker images as deployable will throw some
>>> challenges because creating docker for Spark  is not really a singular
>>> modular unit, say  creating docker for Jenkins. It involves different
>>> versions and different images for Spark and PySpark and most likely will
>>> end up as part of Kubernetes deployment.
>>>
>>>
>>>
>>> Individuals and organisations will deploy it as the first cut. Great but
>>> I equally feel that good documentation on how to build a consumable
>>> deployable image will be more valuable.  FRom my own experience the current
>>> documentation should be enhanced, for example how to deploy working
>>> directories, additional Python packages, build with different Java
>>> versions  (version 8 or version 11) etc.
>>>
>>>
>>>
>>> HTH
>>>
>>>
>>>
>>>    view my Linkedin profile
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <hol...@pigscanfly.ca> wrote:
>>>
>>> Awesome, I've filed an INFRA ticket to get the ball rolling.
>>>
>>>
>>>
>>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <jzh...@apache.org> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <gurwls...@gmail.com>
>>> wrote:
>>>
>>> +1, I think we generally agreed upon having it. Thanks Holden for
>>> headsup and driving this.
>>>
>>> +@Dongjoon Hyun <dongj...@apache.org> FYI
>>>
>>>
>>>
>>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooq...@gmail.com>님이 작성:
>>>
>>> +1
>>>
>>>
>>>
>>> Bests,
>>>
>>>
>>>
>>> *Kent Yao*
>>>
>>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>>>
>>> *a spark* *enthusiast*
>>>
>>> *kyuubi
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is
>>> a unified* *multi-tenant* *JDBC interface for large-scale data
>>> processing and analytics,* *built on top of* *Apache Spark
>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>>> *.*
>>> *spark-authorizer
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A
>>> Spark SQL extension which provides SQL Standard Authorization for* *Apache
>>> Spark
>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>>> *.*
>>> *spark-postgres
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-postgres&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cCM9mLZBaZTF4WYzm22eIf4CU%2FfiiWCD0FUSfXSmaJA%3D&reserved=0>
>>>  **A
>>> library for reading data from and transferring data to Postgres / Greenplum
>>> with Spark SQL and DataFrames, 10~100x faster.*
>>> *itatchi
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-func-extras&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sEhn0HXSzsPSBKhXZlzwQErwwFtcTdTYFqeG9FVpROU%3D&reserved=0>**A
>>> library* *that brings useful functions from various modern database
>>> management systems to* *Apache Spark
>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZIdOvEDv%2FDZWYAB3Bnm4cD1YBVVl3aaHjLiz1HSDsY0%3D&reserved=0>*
>>> *.*
>>>
>>>
>>>
>>>
>>>
>>> On 07/22/2021 11:13,Holden Karau<hol...@pigscanfly.ca>
>>> <hol...@pigscanfly.ca> wrote:
>>>
>>> Hi Folks,
>>>
>>>
>>>
>>> Many other distributed computing (
>>> https://hub.docker.com/r/rayproject/ray
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Frayproject%2Fray&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F%2BPp69I10cyEeSTp6POoNZObOpkkzcZfB35vcdkR8P8%3D&reserved=0>
>>> https://hub.docker.com/u/daskdev
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fdaskdev&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jrQU9WbtFLM1T71SVaZwa0U57F8GcBSFHmXiauQtou0%3D&reserved=0>)
>>> and ASF projects (https://hub.docker.com/u/apache
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fapache&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yD8NWSYhhL6%2BDb3D%2BfD%2F8ynKAL4Wp8BKDMHV0n7jHHM%3D&reserved=0>)
>>> now publish their images to dockerhub.
>>>
>>>
>>>
>>> We've already got the docker image tooling in place, I think we'd need
>>> to ask the ASF to grant permissions to the PMC to publish containers and
>>> update the release steps but I think this could be useful for folks.
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Holden
>>>
>>>
>>>
>>> --
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4qhg1CzKNiiRZkbvzKMp7WL4BoYLzPZ%2FOpFwHu8KNmg%3D&reserved=0>
>>>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5UCR1Qn0fLovLAdTFnJBnLYF3e2NRnL8wEYPhCfLf2A%3D&reserved=0>
>>>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LbsZdvDNTAc804N2dknen%2BoJavleIsh5vwpNaj7xIio%3D&reserved=0>
>>>
>>> --------------------------------------------------------------------- To
>>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>>>
>>> John Zhuge
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x6fXgTuoQqVYqu9JPbt0hG2P0zl6l3p%2FrU5bDng85AY%3D&reserved=0>
>>>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WCHuF%2BcEl0rBZyVOePRQT1AOefwRDlIavu9B0wDmmOk%3D&reserved=0>
>>>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790739490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=52hSM52z%2FFRahVO%2FcRwJ6eDuDInvhhtt1xQfbhMRazQ%3D&reserved=0>
>>>
>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to