Hi, as far as I know these are not standard functions.
Writing UDFs is easy, but only in Java and Scala is it equally efficient as a built-in function. When using Python, data movement/conversion to/from Arrow is still necessary, and that makes a difference in performance. That was the motivation behind these two. I'd object to the rule of not implementing functions not found anywhere else, but there seems to be a consensus around this, so I'll just close the JIRA. Thanks, Petar Sean Owen <sro...@gmail.com> writes: > Is it standard SQL or implemented in Hive? Because UDFs are so relatively > easy in Spark we don't need tons of builtins like an RDBMS does. > > On Tue, Feb 5, 2019, 7:43 AM Petar Zečević <petar.zece...@gmail.com wrote: > > Hi everybody, > I finally created the JIRA ticket and the pull request for the two array > indexing functions: > https://issues.apache.org/jira/browse/SPARK-26826 > > Can any of the committers please check it out? > > Thanks, > Petar > > Petar Zečević <petar.zece...@gmail.com> writes: > > > Hi, > > I implemented two array functions that are useful to us and I wonder if > you think it would be useful to add them to the distribution. The functions > are used for filtering arrays based on indexes: > > > > array_allpositions (named after array_position) - takes a column and a > value and returns an array of the column's indexes corresponding to elements > equal to the provided value > > > > array_select - takes an array column and an array of indexes and returns a > subset of the array based on the provided indexes. > > > > If you agree with this addition I can create a JIRA ticket and a pull > request. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org