Good catch, though probably very slightly simpler to write
math.min(requiredSamples.toDouble ...
Make sure you're logged in to JIRA maybe. If you have any trouble I'll
open it for you. You can file it as a minor bug against ML.
This is how you open a PR and everything else
https://cwiki.apache.o
When you click on Create, you're brought to 'Create Issue' dialog where you
choose Project Spark.
Component should be MLlib.
Please see also:
http://search-hadoop.com/m/q3RTtmsshe1W6cH22/spark+pull+template&subj=pull+request+template
On Mon, Feb 22, 2016 at 6:45 PM, Pierson, Oliver C wrote:
>
Hello,
I've discovered a bug in the QuantileDiscretizer estimator. Specifically,
for large DataFrames QuantileDiscretizer will only create one split (i.e. two
bins).
The error happens in lines 113 and 114 of QuantileDiscretizer.scala:
val requiredSamples = math.max(numBins * numBins,