Bump, check if this is actually going to the group? I can't see my recent posts on the archives:
http://apache-spark-user-list.1001560.n3.nabble.com/ Is there a reason it would not show up here? Thanks! On Tue, Sep 6, 2016 at 11:28 AM Thunder Stumpges <thunder.stump...@gmail.com> wrote: > Hi guys, Spark 1.6.1 here. > > I am trying to "DataFrame-ize" a complex function I have that currently > operates on a DataSet, and returns another DataSet with a new "column" > added to it. I'm trying to fit this into the new ML "Model" format where I > can receive a DataFrame, ensure the input column exists, then perform my > transform and append as a new column. > > From reviewing other ML Model code, the way I see this happen is typically > using a UDF on the input to create the output. My problem is this requires > the UDF to operate on each record one by one. > > In my case I am doing a chain of RDD/DataSet operations (flatMap, join > with another cached RDD, run a calculation, reduce) on the original input > column. > > How can I do this with DataFrames? > > thanks, > Thunder > >