I am trying to port the following python function to Java 8. I would like my
java implementation to implement Transformer so I can use it in a pipeline.

I am having a heck of a time trying to figure out how to create a Column
variable I can pass to DataFrame.withColumn(). As far as I know withColumn()
the only way to append a column to a data frame.

Any comments or suggestions would be greatly appreciated

Andy


def convertMultinomialLabelToBinary(dataFrame):
    newColName = "binomialLabel"
    binomial = udf(lambda labelStr: labelStr if (labelStr == "noise") else
³signal", StringType())
    ret = dataFrame.withColumn(newColName, binomial(dataFrame["label"]))
    return ret

trainingDF2 = convertMultinomialLabelToBinary(trainingDF1)


public class LabelToBinaryTransformer extends Transformer {

    private static final long serialVersionUID = 4202800448830968904L;

    private  final UUID uid = UUID.randomUUID();

    public String inputCol;

    public String outputCol;

    

    @Override

    public String uid() {

        return uid.toString();

    }



    @Override

    public Transformer copy(ParamMap pm) {

        Params xx = defaultCopy(pm);

        return ???;

    }



    @Override

    public DataFrame transform(DataFrame df) {

        MyUDF myUDF = new MyUDF(myUDF, null, null);

        Column c = df.col(inputCol);

??? UDF apply does not take a col????

        Column col = myUDF.apply(df.col(inputCol));

        DataFrame ret = df.withColumn(outputCol, col);

        return ret;

    }



    @Override

    public StructType transformSchema(StructType arg0) {

       ??? What is this function supposed to do???

      ???Is this the type of the new output column????

    }

    

    class MyUDF extends UserDefinedFunction {

        public MyUDF(Object f, DataType dataType, Seq<DataType> inputTypes)
{

            super(f, dataType, inputTypes);

            ??? Why do I have to implement this constructor ???

    ??? What are the arguments ???

        }

        

        @Override

        public

        Column apply(scala.collection.Seq<Column> exprs) {

    What do you do with a scala seq?

            return ???;

        }

    }

}





Reply via email to