I am trying to port the following python function to Java 8. I would like my
java implementation to implement Transformer so I can use it in a pipeline.
I am having a heck of a time trying to figure out how to create a Column
variable I can pass to DataFrame.withColumn(). As far as I know withColumn()
the only way to append a column to a data frame.
Any comments or suggestions would be greatly appreciated
Andy
def convertMultinomialLabelToBinary(dataFrame):
newColName = "binomialLabel"
binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
³signal", StringType())
ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
return ret
trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)
public class LabelToBinaryTransformer extends Transformer {
private static final long serialVersionUID = 4202800448830968904L;
private final UUID uid = UUID.randomUUID();
public String inputCol;
public String outputCol;
@Override
public String uid() {
return uid.toString();
}
@Override
public Transformer copy(ParamMap pm) {
Params xx = defaultCopy(pm);
return ???;
}
@Override
public DataFrame transform(DataFrame df) {
MyUDF myUDF = new MyUDF(myUDF, null, null);
Column c = df.col(inputCol);
??? UDF apply does not take a col????
Column col = myUDF.apply(df.col(inputCol));
DataFrame ret = df.withColumn(outputCol, col);
return ret;
}
@Override
public StructType transformSchema(StructType arg0) {
??? What is this function supposed to do???
???Is this the type of the new output column????
}
class MyUDF extends UserDefinedFunction {
public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes)
{
super(f, dataType, inputTypes);
??? Why do I have to implement this constructor ???
??? What are the arguments ???
}
@Override
public
Column apply(scala.collection.Seq<Column> exprs) {
What do you do with a scala seq?
return ???;
}
}
}