Re: Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread bradc
Hi, While it is also missing in spark.mllib, I'd suggest adding cardinality as part of the Simple descriptive statistics for both spark.ml and spark.mlib? This is useful even for data in double precision FP to understand the "uniqueness" of the feature data. Cheers, Brad -- View this message

Re: MLlib mission and goals

2017-01-24 Thread bradc
I believe one of the higher level goals of Spark MLlib should be to improve the efficiency of the ML algorithms that already exist. Currently there ML has a reasonable coverage of the important core algorithms. The work to get to feature parity for DataFrame-based API and model persistence are a