It sounds like you may want the Bucketizer in SparkML. The overview docs [1] include, "Bucketizer transforms a column of continuous features to a column of feature buckets, where the buckets are specified by users."
[1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer On Mon, Jan 25, 2016 at 5:34 AM, Eli Super <eli.su...@gmail.com> wrote: > Hi > > What is a best way to discretize Continuous Variable within Spark > DataFrames ? > > I want to discretize some variable 1) by equal frequency 2) by k-means > > I usually use R for this porpoises > > _http://www.inside-r.org/packages/cran/arules/docs/discretize > > R code for example : > > ### equal frequency > table(discretize(data$some_column, "frequency", categories=10)) > > > #k-means > table(discretize(data$some_column, "cluster", categories=10)) > > Thanks a lot ! > -- Joshua Taylor, http://www.cs.rpi.edu/~tayloj/