I am trying to port the following python function to Java 8. I would like my java implementation to implement Transformer so I can use it in a pipeline.
I am having a heck of a time trying to figure out how to create a Column variable I can pass to DataFrame.withColumn(). As far as I know withColumn() the only way to append a column to a data frame. Any comments or suggestions would be greatly appreciated Andy def convertMultinomialLabelToBinary(dataFrame): newColName = "binomialLabel" binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else ³signal", StringType()) ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"])) return ret trainingDF2 = convertMultinomialLabelToBinary(trainingDF1) public class LabelToBinaryTransformer extends Transformer { private static final long serialVersionUID = 4202800448830968904L; private final UUID uid = UUID.randomUUID(); public String inputCol; public String outputCol; @Override public String uid() { return uid.toString(); } @Override public Transformer copy(ParamMap pm) { Params xx = defaultCopy(pm); return ???; } @Override public DataFrame transform(DataFrame df) { MyUDF myUDF = new MyUDF(myUDF, null, null); Column c = df.col(inputCol); ??? UDF apply does not take a col???? Column col = myUDF.apply(df.col(inputCol)); DataFrame ret = df.withColumn(outputCol, col); return ret; } @Override public StructType transformSchema(StructType arg0) { ??? What is this function supposed to do??? ???Is this the type of the new output column???? } class MyUDF extends UserDefinedFunction { public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes) { super(f, dataType, inputTypes); ??? Why do I have to implement this constructor ??? ??? What are the arguments ??? } @Override public Column apply(scala.collection.Seq<Column> exprs) { What do you do with a scala seq? return ???; } } }