Re: Opening a JIRA for QuantileDiscretizer bug

2016-02-23 Thread Sean Owen
Good catch, though probably very slightly simpler to write math.min(requiredSamples.toDouble ... Make sure you're logged in to JIRA maybe. If you have any trouble I'll open it for you. You can file it as a minor bug against ML. This is how you open a PR and everything else https://cwiki.apache.o

Re: Opening a JIRA for QuantileDiscretizer bug

2016-02-22 Thread Ted Yu
When you click on Create, you're brought to 'Create Issue' dialog where you choose Project Spark. Component should be MLlib. Please see also: http://search-hadoop.com/m/q3RTtmsshe1W6cH22/spark+pull+template&subj=pull+request+template On Mon, Feb 22, 2016 at 6:45 PM, Pierson, Oliver C wrote: >

Opening a JIRA for QuantileDiscretizer bug

2016-02-22 Thread Pierson, Oliver C
Hello, I've discovered a bug in the QuantileDiscretizer estimator. Specifically, for large DataFrames QuantileDiscretizer will only create one split (i.e. two bins). The error happens in lines 113 and 114 of QuantileDiscretizer.scala: val requiredSamples = math.max(numBins * numBins,