Hi,

In the Java Spark DataFrames API, you can create a UDF, register it, and
then access it by string name by using the convenience UDF classes in
org.apache.spark.sql.api.java
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/api/java/package-summary.html>
.

Example

UDF1<String, Long> testUdf1 = new UDF1<>() { ... }

sqlContext.udf().register("testfn", testUdf1, DataTypes.LongType);

DataFrame df2 = df.withColumn("new_col", *functions.callUDF("testfn"*,
df.col("old_col")));

However, I'd like to avoid registering these by name, if possible, since I
have many of them and would need to deal with name conflicts.

There are udf() methods like this that seem to be from the Scala API
<https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#udf(scala.Function1,%20scala.reflect.api.TypeTags.TypeTag,%20scala.reflect.api.TypeTags.TypeTag)>,
where you don't have to register everything by name first.

However, using those methods from Java would require interacting with
Scala's scala.reflect.api.TypeTags.TypeTag. I'm having a hard time figuring
out how to create a TypeTag from Java.

Does anyone have an example of using the udf() methods from Java?

Thanks!

- Everett

Reply via email to