What is the use case?
Unless you have unlimited funding and time to waste you would usually start
with that.

Regards,
Gourav

On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> Spark in Scala (or java) Is much more performant if you are using RDD's,
> those operations basically force you to pass lambdas, hit serialization
> between java and python types and yes hit the Global Interpreter Lock. But,
> none of those things apply to Data Frames which will generate Java code
> regardless of what language you use to describe the Dataframe operations as
> long as you don't use python lambdas. A Dataframe operation without python
> lambdas should not require any remote python code execution.
>
> TLDR, If you are using Dataframes it doesn't matter if you use Scala,
> Java, Python, R, SQL, the planning and work will all happen in the JVM.
>
> As for a repl, you can run PySpark which will start up a repl. There are
> also a slew of notebooks which provide interactive python environments as
> well.
>
>
> On Fri, Oct 9, 2020 at 4:19 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks
>>
>> So ignoring Python lambdas is it a matter of individuals familiarity with
>> the language that is the most important factor? Also I have noticed that
>> Spark document preferences have been switched from Scala to Python as the
>> first example. However, some codes for example JDBC calls are the same for
>> Scala and Python.
>>
>> Some examples like this website
>> <https://www.kdnuggets.com/2018/05/apache-spark-python-scala.html#:~:text=Scala%20is%20frequently%20over%2010,languages%20are%20faster%20than%20interpreted.>
>> claim that Scala performance is an order of magnitude better than Python
>> and also when it comes to concurrency Scala is a better choice. Maybe it is
>> pretty old (2018)?
>>
>> Also (and may be my ignorance I have not researched it) does Spark offer
>> REPL in the form of spark-shell with Python?
>>
>>
>> Regards,
>>
>> Mich
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 9 Oct 2020 at 21:59, Russell Spitzer <russell.spit...@gmail.com>
>> wrote:
>>
>>> As long as you don't use python lambdas in your Spark job there should
>>> be almost no difference between the Scala and Python dataframe code. Once
>>> you introduce python lambdas you will hit some significant serialization
>>> penalties as well as have to run actual work code in python. As long as no
>>> lambdas are used, everything will operate with Catalyst compiled java code
>>> so there won't be a big difference between python and scala.
>>>
>>> On Fri, Oct 9, 2020 at 3:57 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> I have come across occasions when the teams use Python with Spark for
>>>> ETL, for example processing data from S3 buckets into Snowflake with Spark.
>>>>
>>>> The only reason I think they are choosing Python as opposed to Scala is
>>>> because they are more familiar with Python. Since Spark is written in
>>>> Scala, itself is an indication of why I think Scala has an edge.
>>>>
>>>> I have not done one to one comparison of Spark with Scala vs Spark with
>>>> Python. I understand for data science purposes most libraries like
>>>> TensorFlow etc. are written in Python but I am at loss to understand the
>>>> validity of using Python with Spark for ETL purposes.
>>>>
>>>> These are my understanding but they are not facts so I would like to
>>>> get some informed views on this if I can?
>>>>
>>>> Many thanks,
>>>>
>>>> Mich
>>>>
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>

Reply via email to