subject:"RE\: Design document \- MLlib's statistical package for DataFrames"

Re: Design document - MLlib's statistical package for DataFrames

2017-02-18 Thread Holden Karau

witter: @nirvanainternat > > -Original Message- > From: Tim Hunter [mailto:timhun...@databricks.com] > Sent: Friday, February 17, 2017 1:49 PM > To: bradc > Cc: dev@spark.apache.org > Subject: Re: Design document - MLlib's statistical package for DataFrames > &g

RE: Design document - MLlib's statistical package for DataFrames

2017-02-18 Thread Pritish Nawlakhe

-Original Message- From: Tim Hunter [mailto:timhun...@databricks.com] Sent: Friday, February 17, 2017 1:49 PM To: bradc Cc: dev@spark.apache.org Subject: Re: Design document - MLlib's statistical package for DataFrames Hi Brad, this task is focusing on moving the existing algor

Re: Design document - MLlib's statistical package for DataFrames

2017-02-17 Thread Tim Hunter

Hi Brad, this task is focusing on moving the existing algorithms, so that we are held up by parity issues. Do you have some paper suggestions for cardinality? I do not think there is a feature request on JIRA either. Tim On Thu, Feb 16, 2017 at 2:21 PM, bradc wrote: > Hi, > > While it is also

Re: Design document - MLlib's statistical package for DataFrames

2017-02-16 Thread bradc

Hi, While it is also missing in spark.mllib, I'd suggest adding cardinality as part of the Simple descriptive statistics for both spark.ml and spark.mlib? This is useful even for data in double precision FP to understand the "uniqueness" of the feature data. Cheers, Brad -- View this message