So we actually do have a script that does the build already it's more a matter of publishing the results for easier use. Currently the script produces three images spark, spark-py, and spark-r. I can certainly see a solid reason to publish like with a jdk11 & jdk8 suffix as well if there is interest in the community. If we want to have a say spark-py-pandas for a Spark container image with everything necessary for the Koalas stuff to work then I think that could be a great PR from someone to add :)
On Fri, Aug 13, 2021 at 1:00 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > should read PySpark > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Agreed. >> >> I have already built a few latest for Spark and PYSpark on 3.1.1 with >> Java 8 as I found out Java 11 does not work with Google BigQuery data >> warehouse. However, to hack the Dockerfile one finds out the hard way. >> >> For example how to add additional Python libraries like tensorflow etc. >> Loading these libraries through Kubernetes is not practical as unzipping >> and installing it through --py-files etc will take considerable time so >> they need to be added to the dockerfile at the built time in directory for >> Python under Kubernetes >> >> /opt/spark/kubernetes/dockerfiles/spark/bindings/python >> >> RUN pip install pyyaml numpy cx_Oracle tensorflow .... >> >> Also you will need curl to test the ports from inside the docker >> >> RUN apt-get update && apt-get install -y curl >> RUN ["apt-get","install","-y","vim"] >> >> As I said I am happy to build these specific dockerfiles plus the >> complete documentation for it. I have already built one for Google (GCP). >> The difference between Spark and PySpark version is that in Spark/scala a >> fat jar file will contain all needed. That is not the case with Python I am >> afraid. >> >> HTH >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD < >> meikel.b...@bertelsmann.de> wrote: >> >>> Hi all, >>> >>> >>> >>> I am Meikel Bode and only an interested reader of dev and user list. >>> Anyway, I would appreciate to have official docker images available. >>> >>> Maybe one could get inspiration from the Jupyter docker stacks and >>> provide an hierarchy of different images like this: >>> >>> >>> >>> >>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships >>> >>> >>> >>> Having a core image only supporting Java, an extended supporting Python >>> and/or R etc. >>> >>> >>> >>> Looking forward to the discussion. >>> >>> >>> >>> Best, >>> >>> Meikel >>> >>> >>> >>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com> >>> *Sent:* Freitag, 13. August 2021 08:45 >>> *Cc:* dev <dev@spark.apache.org> >>> *Subject:* Re: Time to start publishing Spark Docker Images? >>> >>> >>> >>> I concur this is a good idea and certainly worth exploring. >>> >>> >>> >>> In practice, preparing docker images as deployable will throw some >>> challenges because creating docker for Spark is not really a singular >>> modular unit, say creating docker for Jenkins. It involves different >>> versions and different images for Spark and PySpark and most likely will >>> end up as part of Kubernetes deployment. >>> >>> >>> >>> Individuals and organisations will deploy it as the first cut. Great but >>> I equally feel that good documentation on how to build a consumable >>> deployable image will be more valuable. FRom my own experience the current >>> documentation should be enhanced, for example how to deploy working >>> directories, additional Python packages, build with different Java >>> versions (version 8 or version 11) etc. >>> >>> >>> >>> HTH >>> >>> >>> >>> view my Linkedin profile >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> >>> >>> >>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <hol...@pigscanfly.ca> wrote: >>> >>> Awesome, I've filed an INFRA ticket to get the ball rolling. >>> >>> >>> >>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <jzh...@apache.org> wrote: >>> >>> +1 >>> >>> >>> >>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <gurwls...@gmail.com> >>> wrote: >>> >>> +1, I think we generally agreed upon having it. Thanks Holden for >>> headsup and driving this. >>> >>> +@Dongjoon Hyun <dongj...@apache.org> FYI >>> >>> >>> >>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooq...@gmail.com>님이 작성: >>> >>> +1 >>> >>> >>> >>> Bests, >>> >>> >>> >>> *Kent Yao* >>> >>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp. >>> >>> *a spark* *enthusiast* >>> >>> *kyuubi >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is >>> a unified* *multi-tenant* *JDBC interface for large-scale data >>> processing and analytics,* *built on top of* *Apache Spark >>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>* >>> *.* >>> *spark-authorizer >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A >>> Spark SQL extension which provides SQL Standard Authorization for* *Apache >>> Spark >>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>* >>> *.* >>> *spark-postgres >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-postgres&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cCM9mLZBaZTF4WYzm22eIf4CU%2FfiiWCD0FUSfXSmaJA%3D&reserved=0> >>> **A >>> library for reading data from and transferring data to Postgres / Greenplum >>> with Spark SQL and DataFrames, 10~100x faster.* >>> *itatchi >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-func-extras&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sEhn0HXSzsPSBKhXZlzwQErwwFtcTdTYFqeG9FVpROU%3D&reserved=0>**A >>> library* *that brings useful functions from various modern database >>> management systems to* *Apache Spark >>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790699667%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZIdOvEDv%2FDZWYAB3Bnm4cD1YBVVl3aaHjLiz1HSDsY0%3D&reserved=0>* >>> *.* >>> >>> >>> >>> >>> >>> On 07/22/2021 11:13,Holden Karau<hol...@pigscanfly.ca> >>> <hol...@pigscanfly.ca> wrote: >>> >>> Hi Folks, >>> >>> >>> >>> Many other distributed computing ( >>> https://hub.docker.com/r/rayproject/ray >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fr%2Frayproject%2Fray&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2F%2BPp69I10cyEeSTp6POoNZObOpkkzcZfB35vcdkR8P8%3D&reserved=0> >>> https://hub.docker.com/u/daskdev >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fdaskdev&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790709619%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jrQU9WbtFLM1T71SVaZwa0U57F8GcBSFHmXiauQtou0%3D&reserved=0>) >>> and ASF projects (https://hub.docker.com/u/apache >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhub.docker.com%2Fu%2Fapache&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yD8NWSYhhL6%2BDb3D%2BfD%2F8ynKAL4Wp8BKDMHV0n7jHHM%3D&reserved=0>) >>> now publish their images to dockerhub. >>> >>> >>> >>> We've already got the docker image tooling in place, I think we'd need >>> to ask the ASF to grant permissions to the PMC to publish containers and >>> update the release steps but I think this could be useful for folks. >>> >>> >>> >>> Cheers, >>> >>> >>> >>> Holden >>> >>> >>> >>> -- >>> >>> Twitter: https://twitter.com/holdenkarau >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4qhg1CzKNiiRZkbvzKMp7WL4BoYLzPZ%2FOpFwHu8KNmg%3D&reserved=0> >>> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790719573%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5UCR1Qn0fLovLAdTFnJBnLYF3e2NRnL8wEYPhCfLf2A%3D&reserved=0> >>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LbsZdvDNTAc804N2dknen%2BoJavleIsh5vwpNaj7xIio%3D&reserved=0> >>> >>> --------------------------------------------------------------------- To >>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- >>> >>> John Zhuge >>> >>> >>> >>> >>> -- >>> >>> Twitter: https://twitter.com/holdenkarau >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=x6fXgTuoQqVYqu9JPbt0hG2P0zl6l3p%2FrU5bDng85AY%3D&reserved=0> >>> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790729540%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WCHuF%2BcEl0rBZyVOePRQT1AOefwRDlIavu9B0wDmmOk%3D&reserved=0> >>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790739490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=52hSM52z%2FFRahVO%2FcRwJ6eDuDInvhhtt1xQfbhMRazQ%3D&reserved=0> >>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau