When you click on Create, you're brought to 'Create Issue' dialog where you choose Project Spark. Component should be MLlib.
Please see also: http://search-hadoop.com/m/q3RTtmsshe1W6cH22/spark+pull+template&subj=pull+request+template On Mon, Feb 22, 2016 at 6:45 PM, Pierson, Oliver C <o...@gatech.edu> wrote: > Hello, > > I've discovered a bug in the QuantileDiscretizer estimator. > Specifically, for large DataFrames QuantileDiscretizer will only create one > split (i.e. two bins). > > > The error happens in lines 113 and 114 of QuantileDiscretizer.scala: > > > val requiredSamples = math.max(numBins * numBins, 10000) > > val fraction = math.min(requiredSamples / dataset.count(), 1.0) > > > After the first line, requiredSamples is an Int. Therefore, if > requiredSamples > dataset.count() then fraction is always 0.0. > > > The problem can be simply fixed by replacing the first with: > > > val requiredSamples = math.max(numBins * numBins, 10000.0) > > > I've implemented this change in my fork and all tests passed (except for > docker integration, but I think that's another issue). I'm happy to submit > a PR if it will ease someone else's workload. However, I'm unsure of how > to create a JIRA. I've created an account on the issue tracker ( > issues.apache.org) but when I try to create an issue it asks me to choose > a "Service Desk". Which one should I be choosing? > > > Thanks much, > > Oliver Pierson > > > >