Hi,
While it is also missing in spark.mllib, I'd suggest adding cardinality as
part of the Simple descriptive statistics for both spark.ml and spark.mlib?
This is useful even for data in double precision FP to understand the
"uniqueness" of the feature data.
Cheers,
Brad
--
View this message
I believe one of the higher level goals of Spark MLlib should be to improve
the efficiency of the ML algorithms that already exist. Currently there ML
has a reasonable coverage of the important core algorithms. The work to get
to feature parity for DataFrame-based API and model persistence are a