Hi,
as far as I know these are not standard functions.

Writing UDFs is easy, but only in Java and Scala is it equally efficient as a 
built-in function. When using Python, data movement/conversion to/from Arrow is 
still necessary, and that makes a difference in performance. That was the 
motivation behind these two.

I'd object to the rule of not implementing functions not found anywhere else, 
but there seems to be a consensus around this, so I'll just close the JIRA.

Thanks,
Petar


Sean Owen <sro...@gmail.com> writes:

> Is it standard SQL or implemented in Hive? Because UDFs are so relatively 
> easy in Spark we don't need tons of builtins like an RDBMS does. 
>
> On Tue, Feb 5, 2019, 7:43 AM Petar Zečević <petar.zece...@gmail.com wrote:
>
>  Hi everybody,
>  I finally created the JIRA ticket and the pull request for the two array 
> indexing functions:
>  https://issues.apache.org/jira/browse/SPARK-26826
>
>  Can any of the committers please check it out?
>
>  Thanks,
>  Petar
>
>  Petar Zečević <petar.zece...@gmail.com> writes:
>
>  > Hi,
>  > I implemented two array functions that are useful to us and I wonder if 
> you think it would be useful to add them to the distribution. The functions 
> are used for filtering arrays based on indexes:
>  >
>  > array_allpositions (named after array_position) - takes a column and a 
> value and returns an array of the column's indexes corresponding to elements 
> equal to the provided value
>  >
>  > array_select - takes an array column and an array of indexes and returns a 
> subset of the array based on the provided indexes.
>  >
>  > If you agree with this addition I can create a JIRA ticket and a pull 
> request.
>
>  ---------------------------------------------------------------------
>  To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to