Re: [R] Question about Cubist Model

2017-01-12 Thread Mxkuhn
> On Jan 12, 2017, at 5:37 PM, Lorenzo Isella wrote: > > Dear All, > I am fine tuning a Cubist model (see > https://cran.r-project.org/web/packages/Cubist/index.html). > I am a bit puzzled by its output. On a dataset which contains 275 > cases, I get non mutually exclusive rules. > E.g., in the

Re: [R] what constitutes a 'complete sentence'?

2015-07-04 Thread mxkuhn
I encountered this a few months ago and, in my case, the sentence had a noun and verb but lacked a period at the end of the sentence. I tested that 'blah blah blah.' would have passed in that version of R-devel. Whenever I find a new rule or test with R CMD check, I tell myself that it must be

Re: [R] Caret and Model Prediction

2014-10-06 Thread mxkuhn
> On Oct 5, 2014, at 4:51 PM, Lorenzo Isella wrote: > > Thanks a lot. > At this point then I wonder: seen that my response consists of 5 > outcomes for each set of features, should I then train 5 different > models (one for each of them)? > Cheers caret can only model one outcome at a time so

Re: [R] Caret train with glmnet give me Error "arguments imply differing number of rows"

2013-06-11 Thread Mxkuhn
The data size isn't an issue. Can you send a reproducible example? Max On Jun 11, 2013, at 10:31 AM, Ferran Casarramona wrote: > Hello, > > I'm training a set of data with Caret package using an elastic net (glmnet). > Most of the time train works ok, but when the data set grows in size I ge

Re: [R] Parallelizing GBM

2013-03-24 Thread Mxkuhn
Yes, I think the second link is a test build of a parallelized cv loop within gbm(). On Mar 24, 2013, at 9:28 AM, "Lorenzo Isella" wrote: > Thanks a lot for the quick answer. > However, from what I see, the parallelization affects only the > cross-validation part in the gbm interface (but it

Re: [R] LOOCV over SVM,KNN

2013-03-23 Thread mxkuhn
train() in caret. See http://caret.r-forge.r-project.org/ Also, the C5.0 function in the C50 is much more effective than J48. Max On Mar 23, 2013, at 2:57 PM, Nicolás Sánchez wrote: > Good afternoon. > > I would like to know if there is any function in R to do LOOCV with these > classi

Re: [R] Feature selection package for text mining

2013-03-13 Thread mxkuhn
caret has recursive feature and simple feature filters. I've got some genetic algorithm code (using the GA package). CORElearn also has the relief algorithm and a lot of different measures of feature importance. Max On Mar 13, 2013, at 3:57 AM, "C.H." wrote: > FSelector > > Maybe chi-sq i

Re: [R] odfWeave: Trouble Getting the Package to Work

2013-02-18 Thread mxkuhn
What version of odfWeave and XML? On Feb 18, 2013, at 11:49 AM, Paul Miller wrote: > Hi Max, > > Sorry I didn't provide sufficient information. Below is my sessionInfo with > all code included. > > Thanks, > > Paul > > sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-pc-

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-21 Thread mxkuhn
There is also C5.0 in the C50 package. It tends to have smaller trees that C4.5 and much smaller trees than J48 when there are factor predictors. Also, it has an optional feature selection ("winnow") step that can be used. Max On Sep 21, 2012, at 2:18 AM, Achim Zeileis wrote: > Hi, > > just

Re: [R] Repeated cross-validation for a lm object

2012-02-18 Thread mxkuhn
The train function in the caret package will do this. The trainControl function would use method ="repeatedcv" and repeats = 100. On Feb 18, 2012, at 2:15 PM, Greg Snow <538...@gmail.com> wrote: > The validate function in the rms package can do cross validation of > ols objects (ols is similar

Re: [R] Rpart

2011-10-31 Thread mxkuhn
This mostly happens when the data contain invalid column names (such as all numbers). Try using make.names() on the datasets. Max On Oct 30, 2011, at 11:35 AM, Luisa Sêco wrote: > Dear users, > > I'm using rpart for classification trees, but my code isn't working when I > try to use all the

Re: [R] glmnet with binary logistic regression

2011-07-24 Thread mxkuhn
10 fold cv has high variation compared to other methods. Use repeated cv or the bootstrap instead (both of which can be used with glmnet by way of the train() function on the caret package). Max On Jul 23, 2011, at 11:43 AM, fongchun wrote: > Hi Patrick, > > Thanks for the reply. I am ref

Re: [R] use "caret" to rank predictors by random forest model

2011-03-14 Thread mxkuhn
Xiaoqi, You need to specify the sizes. There are other search algorithms that auotmatically pick the size (such as genetic algorithms), but I don't have those in the package yet. Another approach is to use univariate filtering (see the sbf function in caret). Max On Mar 13, 2011, at 8:49 PM,

Re: [R] odfWeave graphics glitch

2011-02-22 Thread mxkuhn
John, What version of odfWeave and OO are you using? Thanks, Max On Feb 22, 2011, at 3:17 PM, "Prof. John C Nash" wrote: > Using R2.12.1 on Ubuntu 10.04.1 I've tried to run the following code chunk in > odfWeave > > <>= > x<-seq(1:100)/10 > y<-sin(cos(x/pi)) > imageDefs <- getImageDefs() >

Re: [R] Random Forest & Cross Validation

2011-02-22 Thread mxkuhn
If you want to get honest estimates of accuracy, you should repeat the feature selection within the resampling (not the test set). You will get different lists each time, but that's the point. Right now you are not capturing that uncertainty which is why the oob and test set results differ so mu

Re: [R] Prediction accuracy from Bagging with continuous data

2011-02-10 Thread mxkuhn
If you do use correlation, you should think about doing it on the log or sort scale. The train() function in the caret package can estimate performance using resampling. There are examples in ?train that show how to define custom performance measures (I think it shows how to do this with MAD es

Re: [R] Train error:: subscript out of bonds

2011-01-25 Thread mxkuhn
You should try different tuning parameters; the defaults are not likely to work for many datasets. I don't use the polynomial kernel too much but scale parameter values that are really of could cause this. Unlike the rbf, I don't know of any good techniques for estimating this. Max On Jan 25,

Re: [R] less than full rank contrast methods

2010-12-07 Thread mxkuhn
Greg and Frank, Thanks for the replies. I didn't express myself very well; I'm not interest in the model fitting aspect. I'd just like to get the full set of dummy variables (optimally from model.matrix) Max On Dec 6, 2010, at 10:29 PM, Frank Harrell wrote: > > Given a non-singular fit, the

Re: [R] to determine the variable importance in svm

2010-10-27 Thread mxkuhn
> is ipred:errorest method is good enough for validating (or cross check) my > svm result? Yes, if you know what values of the tuning parameters to use, but I don't know why it was failing. Max __ R-help@r-project.org mailing list https://stat.ethz.c

Re: [R] Random Forest AUC

2010-10-23 Thread mxkuhn
I think the issue is that you really can't use the training set to judge this (without resampling). For example, k nearest neighbors are not known to over fit, but a 1nn model will always perfectly predict the training data. Max On Oct 23, 2010, at 9:05 AM, "Liaw, Andy" wrote: > What Breim

Re: [R] Random Forest - Strata

2010-07-21 Thread mxkuhn
If you use the index argument of the trainControl() function in the caret package, the train() function can be used for this type of resampling (and you'll get some decent summaries and visualizations to boot) Max On Jul 21, 2010, at 7:11 AM, "Tim Howard" wrote: > Coll, > > An alternative ap