Spark is written in Scala, so yes it's still the strongest option.  You
also get the Dataset type with Scala (compile time type-safety), and that's
not an available feature with Python.

That said, I think the Python API is a viable candidate if you use Pandas
for Data Science.  There are similarities between the DataFrame and Pandas
APIs, and you can convert a Spark DataFrame to a Pandas DataFrame.

On Mon, Nov 21, 2016 at 1:51 PM, Brandon White <bwwintheho...@gmail.com>
wrote:

> Hello all,
>
> I will be starting a new Spark codebase and I would like to get opinions
> on using Python over Scala. Historically, the Scala API has always been the
> strongest interface to Spark. Is this still true? Are there still many
> benefits and additional features in the Scala API that are not available in
> the Python API? Are there any performance concerns using the Python API
> that do not exist when using the Scala API? Anything else I should know
> about?
>
> I appreciate any insight you have on using the Scala API over the Python
> API.
>
> Brandon
>

Reply via email to