Sorry for the spam - I had some success;

case class ScoringDF(function: Row => Double) extends Expression {
  val dataType = DataTypes.DoubleType

  override type EvaluatedType = Double

  override def eval(input: Row): EvaluatedType = {
    function(input)
  }

  override def nullable: Boolean = false

  override def children: Seq[Expression] = Nil
}

But this falls over if I want to return an Array[Double];

case class ScoringDF(function: Row => Array[Double]) extends Expression {
  val dataType = DataTypes.createArrayType(DataTypes.DoubleType)

  override type EvaluatedType = Array[Double]

  override def eval(input: Row): EvaluatedType = {
    function(input)
  }

  override def nullable: Boolean = false

  override def children: Seq[Expression] = Nil
}


 get the following exception;

scala> dfs.show
java.lang.ClassCastException: [D cannot be cast to scala.collection.Seq
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$2.apply(CatalystTypeConverters.scala:282)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$.convertRowWithConverters(CatalystTypeConverters.scala:348)
at
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$4.apply(CatalystTypeConverters.scala:301)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$2.apply(SparkPlan.scala:150)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeTake$2.apply(SparkPlan.scala:150)

Any ideas?

On Tue, Sep 8, 2015 at 5:47 PM, Night Wolf <nightwolf...@gmail.com> wrote:

> So basically I need something like
>
> df.withColumn("score", new Column(new Expression {
>  ...
>
> def eval(input: Row = null): EvaluatedType = myModel.score(input)
> ...
>
> }))
>
> But I can't do this, so how can I make a UDF or something like it, that
> can take in a Row and pass back a double value or some struct...
>
> On Tue, Sep 8, 2015 at 5:33 PM, Night Wolf <nightwolf...@gmail.com> wrote:
>
>> Not sure how that would work. Really I want to tack on an extra column
>> onto the DF with a UDF that can take a Row object.
>>
>> On Tue, Sep 8, 2015 at 1:54 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Can you use a map or list with different properties as one parameter?
>>> Alternatively a string where parameters are Comma-separated...
>>>
>>> Le lun. 7 sept. 2015 à 8:35, Night Wolf <nightwolf...@gmail.com> a
>>> écrit :
>>>
>>>> Is it possible to have a UDF which takes a variable number of arguments?
>>>>
>>>> e.g. df.select(myUdf($"*")) fails with
>>>>
>>>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project
>>>> [scalaUDF(*) AS scalaUDF(*)#26];
>>>>
>>>> What I would like to do is pass in a generic data frame which can be
>>>> then passed to a UDF which does scoring of a model. The UDF needs to know
>>>> the schema to map column names in the model to columns in the DataFrame.
>>>>
>>>> The model has 100s of factors (very wide), so I can't just have a
>>>> scoring UDF that has 500 parameters (for obvious reasons).
>>>>
>>>> Cheers,
>>>> ~N
>>>>
>>>
>>
>

Reply via email to