Re: [R] GAM with the negative binomial distribution: why do predictions no match with original values?

peter dalgaard Fri, 25 Nov 2016 12:27:07 -0800

> On 23 Nov 2016, at 23:32 , Simon Wood <simon.w...@bath.edu> wrote:
> 
> ?predict.gam (mgcv) says....
>  
>    "Note that, in common with other prediction functions, any offset
>      supplied to ‘gam’ as an argument is always ignored when
>      predicting, unlike offsets specified in the gam model formula."
>



> .... which was originally implemented to prevent surprises to people familiar 
> with predict.lm (which used to behave that way, and was documented to behave 
> that way).... the problem is that predict.lm doesn't behave like that any 
> more, so I guess at some point I should remove this feature...
> 

Fortune candidate...

-pd

> best,
> Simon 
>  
> 
> 
> 
> On 23/11/16 00:24, Marine Regis wrote:
>> Thanks a lot for your answers.
>> Peter: sorry, here is the missing information:
>> 
>>   *   I use the function gam() of the package �mgcv�
>>   *   Yes, the output changes when I use offset(log_trap_eff) instead of 
>> offset=log_trap_eff. By using offset(log_trap_eff), the output is more 
>> coherent with the observed values. Here are the new predictions:
>> 
>>> summary(mod$fit)
>>> 
>> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> 10.01   68.14   85.71   83.16  101.00  130.20
>> 
>> 
>>   *   I have tried to create a reproductive example to show the difference 
>> between offset(log_trap_eff) and offset=log_trap_eff.
>> nb_unique <- rnegbin(58, mu=82, theta=13.446)
>> x <- runif(58,min=-465300,max=435200)
>> prop_forest <- runif(58,min=0,max=1)
>> log_trap_eff <- runif(58,min=4,max=6)
>> 
>> With offset=log_trap_eff:
>> 
>> mod1 <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, 
>> family=nb(theta=NULL, link="log"), method = "REML", select = TRUE)
>> 
>> 
>>> mod1Pred <- predict.gam(mod1, se.fit=TRUE, type="response")
>>> 
>>> summary(mod1Pred$fit)
>>> 
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> 
>>  0.5852  0.5852  0.5852  0.5852  0.5852  0.5852
>> 
>> 
>> With offset(log_trap_eff):
>> mod2 <- gam(nb_unique ~ s(x,prop_forest) + offset(log_trap_eff), 
>> family=nb(theta=NULL, link="log"), method = "REML", select = TRUE)
>> 
>> 
>>> mod2Pred <- predict.gam(mod2, se.fit=TRUE, type="response")
>>> 
>>> summary(mod2Pred$fit)
>>> 
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> 
>>   32.03   61.18   97.20  112.20  165.00  226.00
>> 
>> 
>> 
>> Value range of observed data:
>> 
>> 
>>> summary(nb_unique)
>>> 
>>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>> 
>>   43.00   67.00   81.00   84.16   92.75  153.00
>> 
>> 
>>   *   By using fitted(mod), I obtain NULL.
>> I am a novice in GAMs. So, I don�t know why the results are different 
>> between models with offset=argument and offset().
>> Thanks a lot for your help.
>> Have a nice day
>> Marine
>> 
>> 
>> 
>> ________________________________
>> De : peter dalgaard 
>> <pda...@gmail.com>
>> 
>> Envoy� : mardi 22 novembre 2016 23:52
>> � : Bert Gunter
>> Cc : Marine Regis; 
>> r-help@r-project.org
>> 
>> Objet : Re: [R] GAM with the negative binomial distribution: why do 
>> predictions no match with original values?
>> 
>> 
>> 
>>> On 22 Nov 2016, at 23:07 , Bert Gunter <bgunter.4...@gmail.com>
>>>  wrote:
>>> 
>>> Define "very different."  Sounds like a subjective opinion to me, for
>>> which I have no response. Apparently others are similarly flummoxed.
>>> Of course they would not in general be identical.
>>> 
>> Er? I don't see much reason to disagree that a range 0.10-0.18 is different 
>> from 17-147.
>> 
>> However, other bits of information are missing: We don't know which gam() 
>> function is being used (to my knowledge there is one in package gam but also 
>> one in mgcv). We don't have the data, so we cannot reproduce and try to find 
>> the root of the problem.
>> 
>> Offhand, it looks like the predict.gam() function is misbehaving, which 
>> could have something to do with the offset term and/or the nb dispersion 
>> parameter. On a hunch, does anything change if you use
>> 
>> nb_unique ~ s(x,prop_forest) + offset(log_trap_eff)
>> 
>> instead of the offset= argument? And, by the way, does fitted(mod,...) 
>> change anything?
>> 
>> -pd
>> 
>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> 
>>> 
>>> On Tue, Nov 22, 2016 at 1:29 PM, Marine Regis 
>>> <marine.re...@hotmail.fr>
>>>  wrote:
>>> 
>>>> Hello,
>>>> 
>>>> 
>>>>> From capture data, I would like to assess the effect of longitudinal 
>>>>> changes in proportion of forests on abundance of skunks. To test this, I 
>>>>> built this GAM where the dependent variable is the number of unique 
>>>>> skunks and the independent variables are the X coordinates of the 
>>>>> centroids of trapping sites (called "X" in the GAM) and the proportion of 
>>>>> forests within the trapping sites (called "prop_forest" in the GAM):
>>>>> 
>>>>    mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, 
>>>> family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", 
>>>> select = TRUE)
>>>>    summary(mod)
>>>> 
>>>>    Family: Negative Binomial(13.446)
>>>>    Link function: log
>>>> 
>>>>    Formula:
>>>>    nb_unique ~ s(x, prop_forest)
>>>> 
>>>>    Parametric coefficients:
>>>>                Estimate Std. Error z value Pr(>|z|)
>>>>    (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>>>>    ---
>>>>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>> 
>>>>    Approximate significance of smooth terms:
>>>>                       edf Ref.df Chi.sq  p-value
>>>>    s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>>>>    ---
>>>>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>> 
>>>>    R-sq.(adj) =   0.37   Deviance explained =   49%
>>>>    -REML = 268.61  Scale est. = 1         n = 58
>>>> 
>>>> 
>>>> I built a GAM  for the negative binomial family. When I use the function 
>>>> `predict.gam`, the predictions of capture success from the GAM and the 
>>>> values of capture success from original data are very different. What is 
>>>> the reason for differences occur?
>>>> 
>>>> **With GAM:**
>>>> 
>>>>    modPred <- predict.gam(mod, se.fit=TRUE,type="response")
>>>>    summary(modPred$fit)
>>>>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>     0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
>>>> 
>>>> **With original data:**
>>>> 
>>>>    summary(succ_capt_skunk$nb_unique)
>>>>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>>>>      17.00   59.00   82.00   81.83  106.80  147.00
>>>> 
>>>> The question has already been posted on Cross validated (
>>>> http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or
>>>> ) without success.
>>>> 
>> [http://cdn.sstatic.net/Sites/stats/img/apple-touch-i...@2.png?v=344f57aa10cc&a]<http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or>
>> 
>> 
>> GAM with the negative binomial distribution: why do predictions no match 
>> with original values?
>> <http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or>
>> 
>> stats.stackexchange.com
>> >From capture data, I would like to assess the effect of longitudinal 
>> >changes in proportion of forests on abundance of skunks. To test this, I 
>> >built this GAM where the dependent variable is the numb...
>> 
>> 
>> 
>> 
>>>> Thanks a lot for your time.
>>>> Have a nice day
>>>> Marine
>>>> 
>>>> 
>>>>        [[alternative HTML version deleted]]
>>>> 
>>>> ______________________________________________
>>>> 
>>>> R-help@r-project.org
>>>>  mailing list -- To UNSUBSCRIBE and more, see
>>>> 
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> thz.ch/mailman/listinfo/r-help>
>> stat.ethz.ch
>> The main R mailing list, for announcements about the development of R and 
>> the availability of new code, questions and answers about problems and 
>> solutions using R ...
>> 
>> 
>> 
>> 
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> 
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> ______________________________________________
>>> 
>>> R-help@r-project.org
>>>  mailing list -- To UNSUBSCRIBE and more, see
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> thz.ch/mailman/listinfo/r-help>
>> stat.ethz.ch
>> The main R mailing list, for announcements about the development of R and 
>> the availability of new code, questions and answers about problems and 
>> solutions using R ...
>> 
>> 
>> 
>> 
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> 
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: 
>> pd....@cbs.dk  Priv: pda...@gmail.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>      [[alternative HTML version deleted]]
>> 
>> 
>> 
>> 
>> ______________________________________________
>> 
>> R-help@r-project.org
>>  mailing list -- To UNSUBSCRIBE and more, see
>> 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> 
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> -- 
> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
> +44 (0)117 33 18273     
> http://www.maths.bris.ac.uk/~sw15190

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GAM with the negative binomial distribution: why do predictions no match with original values?

Reply via email to