Congrats Li! This is awesome. On Mon, Oct 30, 2017 at 2:05 PM Wes McKinney <wesmck...@gmail.com> wrote:
> hi all, > > One of our newest committers, Li Jin, has been driving efforts to > speed up Python UDFs in Spark using Arrow. This was just written about > today: > > > https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html > > It's really exciting to see this kind of cross-project collaboration > bear fruit, and it validates our efforts hardening the Arrow > implementations so that such work can be seen through in real world > analytics applications. We had previously been working with the Spark > community purely on IO / data access by improving the performance of > the toPandas function for Spark data frames in Python > (http://arrow.apache.org/blog/2017/07/26/spark-arrow/). > > Congrats Li and all other involved individuals from the Arrow and > Spark communities for their hard work on this! It is surely just the > beginning of much exciting Arrow-related work up ahead. > > - Wes >