Thank you for your advices. I will try even increased "gamma" values, and all-out cross-validations.
2007/10/3, Frank E Harrell Jr <[EMAIL PROTECTED]>: > Ariyo Kanno wrote: > > Sorry, let me fix 1 sentence. > > > > "Here I try to mean by "overfitting" that GCV was significantly SMALLER > > than the mean square error of prediction of the validation data, which > > was randomly selected and not used for regression." > > > >> Thank you for valuable advices. > > If your test sample includes fewer than 10,000 cases and your signal to > noise ratio is not large, your estimate of cross-validation accuracy may > be unreliable. Often 50-fold repeats of 10-fold cross-validation is > required, without setting aside a single "test" sample. > > Frank > > >> I'm sorry Dr. N. Wood that by mistake I sent this reply firstly to > >> your personal e-mail address. > >> > >> I will use the "min.sp" argument when the data size is very small. I'd > >> like to know if there is any criteria for selecting "min.sp." > >> > >> I compared gamma=1.0 and 1.4, and I could see the smoothing effects of > >> enhancing gamma by comparing edf and smoothing parameter. But it was > >> not enough to suppress the overfitting when data size was small. > >> > >> Here I try to mean by "overfitting" that GCV was significantly larger > >> than the mean square error of prediction of the validation data, which > >> was randomly selected and not used for regression. > >> > >> Best Wishes, > >> Ariyo > >> > >> 2007/10/3, Simon Wood <[EMAIL PROTECTED]>: > >>> On Wednesday 03 October 2007 10:49, Ariyo Kanno wrote: > >>>> I appreciate your quick reply. > >>>> I am using the model of the following structure : > >>>> > >>>> fit <- gam(y~x1+s(x2)) > >>>> > >>>> ,where y, x1, and x2 are quantitative variables. > >>>> So the response distribution is assumed to be gaussian(default). > >>>> > >>>> Now I understand that the data size was too small. > >>> -- Well, the 10 end is definitely too small, but you can get quite > >>> reasonable > >>> estimates of a single smoothing parameter from 30+ gaussian data. > >>> -- You can force smoother models my either setting the smoothing parameter > >>> yourself using the `sp' argument to `gam', or by using the `min.sp' > >>> argument > >>> to set a lower bound on the smoothing parameter. > >>> -- I'm suprised that `gamma' had no effect - how high did you try? > >>> > >>> best, > >>> Simon > >>> > >>> > >>> > >>>> Thank you. > >>>> > >>>> Best Wishes, > >>>> > >>>> Ariyo > >>>> > >>>> 2007/10/3, Simon Wood <[EMAIL PROTECTED]>: > >>>>> What sort of model structure are you using? In particular what is the > >>>>> response distribution? For poisson and binomial then overfitting can be > >>>>> a > >>>>> sign of overdispersion and quasipoisson or quasibinomial may be better. > >>>>> Also I would not expect to get useful smoothing parameter estimates from > >>>>> 10 data! > >>>>> > >>>>> best, > >>>>> Simon > >>>>> > >>>>> On Wednesday 03 October 2007 06:55, ???? wrote: > >>>>>> Dear listers, > >>>>>> > >>>>>> I'm using gam(from mgcv) for semi-parametric regression on small and > >>>>>> noisy datasets(10 to 200 > >>>>>> observations), and facing a problem of overfitting. > >>>>>> > >>>>>> According to the book(Simon N. Wood / Generalized Additive Models: An > >>>>>> Introduction with R), it is > >>>>>> suggested to avoid overfitting by inflating the effective degrees of > >>>>>> freedom in GCV evaluation with > >>>>>> increased "gamma" value(e.g. 1.4). But in my case, it didn't make a > >>>>>> significant change in the > >>>>>> results. > >>>>>> > >>>>>> The only way I've found to suppress overfitting is to set the basis > >>>>>> dimension "k" at very low values > >>>>>> (3 to 5). However, I don't think this is reasonable because knots > >>>>>> selection will then be an > >>>>>> important issue. > >>>>>> > >>>>>> Is there any other means to avoid overfitting when alalyzing small > >>>>>> datasets? > >>>>>> > >>>>>> Thank you for your help in advance, > >>>>>> Ariyo Kanno > >>>>>> > >>>>>> -- > >>>>>> Ariyo Kanno > >>>>>> 1st-year doctor's degree student at > >>>>>> Institute of Environmental Studies, > >>>>>> The University of Tokyo > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-help@r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>>> http://www.R-project.org/posting-guide.html and provide commented, > >>>>>> minimal, self-contained, reproducible code. > >>>>> -- > >>>>> > >>>>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > >>>>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283 > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html and provide commented, > >>>>> minimal, self-contained, reproducible code. > >>> -- > >>>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > >>>> +44 1225 386603 www.maths.bath.ac.uk/~sw283 > >>> ______________________________________________ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > > > > > > ------------------------------------------------------------------------ > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.