Dear Spark users,
Given a DataFrame df with a column named foo bar, I can call a Spark SQL
built-in function on it like so:
df.select(functions.max(df("foo bar")))
However, if I want to apply a Hive UDF named myCustomFunction, I need to
write
df.selectExpr("myCustomFunction(`foo bar`)")
which forces me to deal with escaping the name of the column so I can put
it inside a well-formed SQL query. Is there a programmatic way to invoke a
Hive function by name, so that I don’t have to worry about escaping?
Ideally, I’d like to do something like
val myCustomFunction = functions.udf("myCustomFunction")
df.select(myCustomFunction(df("foo bar")))
… but I couldn’t find any such API.
Regards,
Punya