Re: Faster PySpark UDFs using Apache Arrow in Spark 2.3.0

Phillip Cloud Mon, 30 Oct 2017 14:23:41 -0700

Congrats Li! This is awesome.

On Mon, Oct 30, 2017 at 2:05 PM Wes McKinney <[email protected]> wrote:


> hi all,
>
> One of our newest committers, Li Jin, has been driving efforts to
> speed up Python UDFs in Spark using Arrow. This was just written about
> today:
>
>
> https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
>
> It's really exciting to see this kind of cross-project collaboration
> bear fruit, and it validates our efforts hardening the Arrow
> implementations so that such work can be seen through in real world
> analytics applications. We had previously been working with the Spark
> community purely on IO / data access by improving the performance of
> the toPandas function for Spark data frames in Python
> (http://arrow.apache.org/blog/2017/07/26/spark-arrow/).
>
> Congrats Li and all other involved individuals from the Arrow and
> Spark communities for their hard work on this! It is surely just the
> beginning of much exciting Arrow-related work up ahead.
>
> - Wes
>

Re: Faster PySpark UDFs using Apache Arrow in Spark 2.3.0

Reply via email to