Re: Faster PySpark UDFs using Apache Arrow in Spark 2.3.0

Jacques Nadeau Wed, 08 Nov 2017 14:26:13 -0800

Totally awesome. Nice job Li and everyone else!

On Mon, Oct 30, 2017 at 2:22 PM, Phillip Cloud <[email protected]> wrote:


> Congrats Li! This is awesome.
>
> On Mon, Oct 30, 2017 at 2:05 PM Wes McKinney <[email protected]> wrote:
>
> > hi all,
> >
> > One of our newest committers, Li Jin, has been driving efforts to
> > speed up Python UDFs in Spark using Arrow. This was just written about
> > today:
> >
> >
> > https://databricks.com/blog/2017/10/30/introducing-
> vectorized-udfs-for-pyspark.html
> >
> > It's really exciting to see this kind of cross-project collaboration
> > bear fruit, and it validates our efforts hardening the Arrow
> > implementations so that such work can be seen through in real world
> > analytics applications. We had previously been working with the Spark
> > community purely on IO / data access by improving the performance of
> > the toPandas function for Spark data frames in Python
> > (http://arrow.apache.org/blog/2017/07/26/spark-arrow/).
> >
> > Congrats Li and all other involved individuals from the Arrow and
> > Spark communities for their hard work on this! It is surely just the
> > beginning of much exciting Arrow-related work up ahead.
> >
> > - Wes
> >
>

Re: Faster PySpark UDFs using Apache Arrow in Spark 2.3.0

Reply via email to