It sounds like you may want the Bucketizer in SparkML.  The overview docs
[1] include, "Bucketizer transforms a column of continuous features to a
column of feature buckets, where the buckets are specified by users."

[1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer

On Mon, Jan 25, 2016 at 5:34 AM, Eli Super <eli.su...@gmail.com> wrote:

> Hi
>
> What is a best way to discretize Continuous Variable within  Spark
> DataFrames ?
>
> I want to discretize some variable 1) by equal frequency 2) by k-means
>
> I usually use R  for this porpoises
>
> _http://www.inside-r.org/packages/cran/arules/docs/discretize
>
> R code for example :
>
> ### equal frequency
> table(discretize(data$some_column, "frequency", categories=10))
>
>
> #k-means
> table(discretize(data$some_column, "cluster", categories=10))
>
> Thanks a lot !
>



-- 
Joshua Taylor, http://www.cs.rpi.edu/~tayloj/

Reply via email to