Re: [R] Gamma distribution parameter estimation

Prof Brian Ripley Sun, 07 Aug 2011 00:20:03 -0700

Well, 'Alexander Engelhardt' failed to follow the posting guide, andso did not get a reply from some knowledgeable people.

(1) When using fitdistr you need to roughly scale your data: e.g.divide by 1000 here.

(2) fitdistr does 'go down the maximum likeilhood route' and it doesfind reasonable starting points.

The mean is *not* the mle for the rate, and Mr Harding's gamma pdf isparametrized by 'scale' not 'rate' (see the R help page). Probablynot its intention, but Harding's posting with its schoolboy errorsexemplifies the inadvisibility of 'rolling your own'.

(3) In the same package there is a function gamma.shape to find themle of the shape given the mean. And the package is support for abook, and this should be obvious from the book's index.

(4) A value of 2750 for a shape parameter is totally implausible.Have you any idea at all what gamma(shape = 2750) looks like?


MASS's function gives

b <- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,

3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L ,1699L,4545L)

fit <- glm(b ~ 1, family=Gamma()
coef(fit)

 (Intercept)
0.0003391958

gamma.shape(fit)


Alpha: 7.786827
SE:    2.411487

At that point you have mles of 1/mean and shape. The mle of therate is then

0.0003391958*7.786827

[1] 0.002641259

It's so much easier to use fitdistr:

fitdistr(b/1000, "gamma")

     shape       rate
  7.7871427   2.6413668
 (2.4115976) (0.8449421)

And as a check of plausibility:
dx <- seq(0, 6000, len = 100)
plot(dx, dgamma(dx, shape=7.8, rate=2.6e-3), type = "l")
rug(b)


On Sun, 7 Aug 2011, ted.hard...@wlandres.net wrote:

On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:

On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:

Hi Alex,

Try

require(MASS)

Loading required package: MASS

b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,

+ 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
+ 1699L, 4545L)

fitdistr(b, 'gamma')

       shape           rate
   6.4528939045   0.0021887943
  (0.7722567310) (0.0001854559)


Hi,

I'm getting a weird error here, pasted below. Do you know what
could cause that?

> fitdistr(b,"gamma")
Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
2353L, 1966L,  :
   non-finite finite-difference value [2]
In addition: Warning message:
In dgamma(x, shape, scale, log) : NaNs produced


I'm surprised no=-one has suggested going straight down the
maximum-likelihood route. The solution for the 'rate' parameter
is immediate, while the solution for the 'shape' paramater
needs the solution of an equation in one variable imvolving
the digamma function psi(k).

Writing the density of the gamma distribution as

 (1/G(k))*(x^(k-1))*exp(x/r)/(r^k)

where k is the shape parameter and r is the rate, and
G(k) denotes the gamma function Gamma(k), we have for
the estimates

[1]  r = mean(x.i)

[2]  psi(k) = mean(log(x.i))

where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
denotes first derivative).

So you just need to solve [2] for k.

This is well summarised in the Wikipedia article:

http://en.wikipedia.org/wiki/
Gamma_distribution#Maximum_likelihood_estimation

in the Section "Parameter estimation".

Since the digamma function digamma() is in the base package,
all you need to do is submit it to a root-finder, here done
in the naivest possible way.

With your values of

 x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
       2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
       3001L, 2329L, 3944L, 2568L, 1699L, 4545L)

you get

 mean(log(x))
 # [1] 7.92335

which is quite a modest number (though in a somewhat tense
relationship with the digamma function).

A bit of experimentation with plot() will lead you home:

 t<-(50:200);     plot(t,digamma(t))
 t<-(100:2000);   plot(t,digamma(t))
 t<-20*(100:200); plot(t,digamma(t))

from which the root is somewhere near 2750.

 t<-(2700:2800); plot(t,digamma(t))
 t<-(2745:2865); plot(t,digamma(t))
 t<-2755+0.1*(0:100); plot(t,digamma(t))

So:

 mean(log(x)) - digamma(2760)
 # [1] 0.0005452383
 mean(log(x)) - digamma(2769)
 # [1] -0.002710915
 mean(log(x)) - digamma(2765)
 # [1] -0.001265045
 mean(log(x)) - digamma(2763)
 # [1] -0.0005413247
 mean(log(x)) - digamma(2762)
 # [1] -0.0001792682
 mean(log(x)) - digamma(2761)
 # [1] 0.0001829194
 mean(log(x)) - digamma(2761.5)
 # [1] 1.809230e-06
 mean(log(x)) - digamma(2761.75)
 # [1] -8.873357e-05
 mean(log(x)) - digamma(2761.625)
 # [1] -4.34632e-05
 mean(log(x)) - digamma(2761.5625)
 # [1] -2.082724e-05
 mean(log(x)) - digamma(2761.53125)
 # [1] -9.509068e-06
 mean(log(x)) - digamma(2761.515625)
 # [1] -3.849935e-06
 mean(log(x)) - digamma(2761.5078125)
 # [1] -1.020356e-06
 mean(log(x)) - digamma(2761.50390625)
 # [1] 3.944361e-07
 mean(log(x)) - digamma(2761.505859375)
 # [1] -3.129603e-07
 mean(log(x)) - digamma(2761.5048828125)
 # [1] 4.073787e-08

And so it goes ... Anyway, to 7 significant figures,
the answer is that k = 2761.505, and that is "by hand"
using a simple "interval-halving" procedure.

I don't know why fitdistr() threw an error. It may be
that the starting-value for the iterations was to far
off and threw it into some difficult space. The good
starting value above was obtained by a bit of graphical
trial and error. Maybe something similar can be emulated
in R?

Hoping this helps,
Ted.

PS
Oh, and while I'm at it, the estimate of the rate is

 r = mean(x) = 2948.15


--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-11                                       Time: 02:03:29
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gamma distribution parameter estimation

Reply via email to