Sorry for the spam - I had some success; case class ScoringDF(function: Row => Double) extends Expression { val dataType = DataTypes.DoubleType
override type EvaluatedType = Double override def eval(input: Row): EvaluatedType = { function(input) } override def nullable: Boolean = false override def children: Seq[Expression] = Nil } But this falls over if I want to return an Array[Double]; case class ScoringDF(function: Row => Array[Double]) extends Expression { val dataType = DataTypes.createArrayType(DataTypes.DoubleType) override type EvaluatedType = Array[Double] override def eval(input: Row): EvaluatedType = { function(input) } override def nullable: Boolean = false override def children: Seq[Expression] = Nil } get the following exception; scala> dfs.show java.lang.ClassCastException: [D cannot be cast to scala.collection.Seq at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$2.apply(CatalystTypeConverters.scala:282) at org.apache.spark.sql.catalyst.CatalystTypeConverters$.convertRowWithConverters(CatalystTypeConverters.scala:348) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$4.apply(CatalystTypeConverters.scala:301) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$2.apply(SparkPlan.scala:150) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$2.apply(SparkPlan.scala:150) Any ideas? On Tue, Sep 8, 2015 at 5:47 PM, Night Wolf <nightwolf...@gmail.com> wrote: > So basically I need something like > > df.withColumn("score", new Column(new Expression { > ... > > def eval(input: Row = null): EvaluatedType = myModel.score(input) > ... > > })) > > But I can't do this, so how can I make a UDF or something like it, that > can take in a Row and pass back a double value or some struct... > > On Tue, Sep 8, 2015 at 5:33 PM, Night Wolf <nightwolf...@gmail.com> wrote: > >> Not sure how that would work. Really I want to tack on an extra column >> onto the DF with a UDF that can take a Row object. >> >> On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Can you use a map or list with different properties as one parameter? >>> Alternatively a string where parameters are Comma-separated... >>> >>> Le lun. 7 sept. 2015 à 8:35, Night Wolf <nightwolf...@gmail.com> a >>> écrit : >>> >>>> Is it possible to have a UDF which takes a variable number of arguments? >>>> >>>> e.g. df.select(myUdf($"*")) fails with >>>> >>>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project >>>> [scalaUDF(*) AS scalaUDF(*)#26]; >>>> >>>> What I would like to do is pass in a generic data frame which can be >>>> then passed to a UDF which does scoring of a model. The UDF needs to know >>>> the schema to map column names in the model to columns in the DataFrame. >>>> >>>> The model has 100s of factors (very wide), so I can't just have a >>>> scoring UDF that has 500 parameters (for obvious reasons). >>>> >>>> Cheers, >>>> ~N >>>> >>> >> >