Not quite sure how meaningful this discussion is, but in case someone is really faced with this query the question still is 'what is the use case'? I am just a bit confused with the one size fits all deterministic approach here thought that those days were over almost 10 years ago. Regards Gourav
On Sat, 10 Oct 2020, 21:24 Stephen Boesch, <java...@gmail.com> wrote: > I agree with Wim's assessment of data engineering / ETL vs Data Science. > I wrote pipelines/frameworks for large companies and scala was a much > better choice. But for ad-hoc work interfacing directly with data science > experiments pyspark presents less friction. > > On Sat, 10 Oct 2020 at 13:03, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Many thanks everyone for their valuable contribution. >> >> We all started with Spark a few years ago where Scala was the talk of the >> town. I agree with the note that as long as Spark stayed nish and elite, >> then someone with Scala knowledge was attracting premiums. In fairness in >> 2014-2015, there was not much talk of Data Science input (I may be wrong). >> But the world has moved on so to speak. Python itself has been around >> a long time (long being relative here). Most people either knew UNIX Shell, >> C, Python or Perl or a combination of all these. I recall we had a director >> a few years ago who asked our Hadoop admin for root password to log in to >> the edge node. Later he became head of machine learning somewhere else and >> he loved C and Python. So Python was a gift in disguise. I think Python >> appeals to those who are very familiar with CLI and shell programming (Not >> GUI fan). As some members alluded to there are more people around with >> Python knowledge. Most managers choose Python as the unifying development >> tool because they feel comfortable with it. Frankly I have not seen a >> manager who feels at home with Scala. So in summary it is a bit >> disappointing to abandon Scala and switch to Python just for the sake of it. >> >> Disclaimer: These are opinions and not facts so to speak :) >> >> Cheers, >> >> >> Mich >> >> >> >> >> >> >> On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> I have come across occasions when the teams use Python with Spark for >>> ETL, for example processing data from S3 buckets into Snowflake with Spark. >>> >>> The only reason I think they are choosing Python as opposed to Scala is >>> because they are more familiar with Python. Since Spark is written in >>> Scala, itself is an indication of why I think Scala has an edge. >>> >>> I have not done one to one comparison of Spark with Scala vs Spark with >>> Python. I understand for data science purposes most libraries like >>> TensorFlow etc. are written in Python but I am at loss to understand the >>> validity of using Python with Spark for ETL purposes. >>> >>> These are my understanding but they are not facts so I would like to get >>> some informed views on this if I can? >>> >>> Many thanks, >>> >>> Mich >>> >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>