Today I had a discussion with a lead developer on a client site regarding Scala or PySpark. with Spark.
They were not doing data science and reluctantly agreed that PySpark was used for ETL. In mitigation he mentioned that in his team he is the only one that is an expert on Scala (his words) and the rest are Python savvys. It shows again that at times functionality is sacrificed in favour of the availability of resources and reaffirms what some members were saying regarding the choice of the technology based on TCO, favouring Python over Spark. HTH, Mich *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I have come across occasions when the teams use Python with Spark for ETL, > for example processing data from S3 buckets into Snowflake with Spark. > > The only reason I think they are choosing Python as opposed to Scala is > because they are more familiar with Python. Since Spark is written in > Scala, itself is an indication of why I think Scala has an edge. > > I have not done one to one comparison of Spark with Scala vs Spark with > Python. I understand for data science purposes most libraries like > TensorFlow etc. are written in Python but I am at loss to understand the > validity of using Python with Spark for ETL purposes. > > These are my understanding but they are not facts so I would like to get > some informed views on this if I can? > > Many thanks, > > Mich > > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >