Doesn't the "random" part of random forest defend against overfitting ?
-----Original Message----- From: ey-chih chow [mailto:[email protected]] Sent: Saturday, April 06, 2013 5:45 PM To: [email protected] Subject: Re: Classification Algorithms in Mahout I actually got a lot of over fitting. The parameter that I can adjust is minSplitNum. Is there any other parameters that I can adjust to avoid over fitting. Thanks. Ey-Chih On Wed, Mar 27, 2013 at 3:12 PM, Andy Twigg <[email protected]> wrote: > Dear Ey-Chih, > > What are your use cases for a better random forest? > > On 27 March 2013 11:59, Yutaka Mandai <[email protected]> wrote: > > My understanding of current Random Forrest has a certain level of > improvement for running on Hadoop cluster from data splitting > alignment perspective for better balanced CPU utilization. > > Regards,,, > > Y.Mandai > > > > iPhoneから送信 > > > > On 2013/03/25, at 14:48, Ted Dunning <[email protected]> wrote: > > > >> I think that there are some others who could say more. > >> > >> On Mon, Mar 25, 2013 at 6:01 AM, Ey-Chih chow <[email protected]> wrote: > >> > >>> On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote: > >>> > >>>> - random forest, sequential and parallel implementations, new > >>>> versions > >>> are being developed, the current version may or may not be useful > >>> to > you. > >>>> > >>> Can you elaborate the usefulness of the current version and > >>> features of the new versions? Thanks. > >>> > >>> Ey-Chih Chow > >>> > >>> > >>> On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote: > >>> > >>>> You are correct to suspect that this page is substantially out of > date. > >>>> > >>>> Currently, Mahout has the following classifiers: > >>>> > >>>> - stochastic gradient descent for logistic regression (SGD) with > >>>> L_1 > or > >>> L_2 regularization, sequential version only. These classifiers > >>> can be easily extended with other gradients and regularizers which > >>> should make linear SVM's easy to implement. > >>>> > >>>> - naive bayes, sequential and parallel implementations > >>>> > >>>> - random forest, sequential and parallel implementations, new > >>>> versions > >>> are being developed, the current version may or may not be useful > >>> to > you. > >>>> > >>>> There are a variety of other classifiers which are in various > >>>> states > of > >>> utility. > >>>> > >>>> On Mar 24, 2013, at 4:07 AM, Chidananda Sridhar wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I am doing a class project on classification and want to use Mahout. > I > >>> was > >>>>> searching for the classification algorithms already implemented > >>>>> in > >>> Mahout > >>>>> and came to this page: > >>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > >>>>> > >>>>> The webpage says that Online Passive Aggressive< > >>> > https://cwiki.apache.org/confluence/display/MAHOUT/Online+Passive+Aggr > essive > >>>> is > >>>>> integrated and the rest of the classification algorithms are > >>>>> open or awaiting commit. Does the webpage have the latest > >>>>> information, or is > it > >>> yet > >>>>> to be updated? Is "Online Passive Aggressive" the only algorithm > >>>>> I > can > >>> use > >>>>> for now? On the other hand, I see that most of the clustering > algorithms > >>>>> have been integrated. > >>>>> > >>>>> Thanks, > >>>>> Chidananda > >>>> > >>> > >>> > > > > -- > Dr Andy Twigg > Junior Research Fellow, St Johns College, Oxford Room 351, Department > of Computer Science http://www.cs.ox.ac.uk/people/andy.twigg/ > [email protected] | +447799647538 >
