Re: [R] Gamma distribution parameter estimation

Peter Ehlers Sun, 07 Aug 2011 18:22:06 -0700

On 2011-08-07 16:57, Rolf Turner wrote:



Neither Ted Harding's post,  nor Prof. Ripley's, really appear to
address the OP's follow-up question which was
*why* did he get the ``weird error'' (non-finite finite difference, NaNs
produced) when he applied fitdistr(),
as he was advised to do by Jorge Ivan Velez.  The ``weird error'' did
not occur for Jorge, nor did it occur
for my very good self.  This presumably has something to do with
operating system and/or version of R used,
details of which the OP unfortunately did not provide.


Rolf,

It's not the OS or R version. Close inspection of the error msg
shows that the OP (without mentioning that salient fact) used a
different set of values to fit.

Peter Ehlers


The salient issue is that the answers obtained by applying fitdistr()
directly (without re-scaling)
differ substantially from those obtained after rescaling.  With only 20
data points, any fit is somewhat
suspect.

      cheers,

          Rolf Turner


On 07/08/11 19:17, Prof Brian Ripley wrote:

Well, 'Alexander Engelhardt' failed to follow the posting guide, and
so did not get a reply from some knowledgeable people.

(1) When using fitdistr you need to roughly scale your data: e.g.
divide by 1000 here.

(2) fitdistr does 'go down the maximum likeilhood route' and it does
find reasonable starting points.

The mean is *not* the mle for the rate, and Mr Harding's gamma pdf is
parametrized by 'scale' not 'rate' (see the R help page).  Probably
not its intention, but Harding's posting with its schoolboy errors
exemplifies the inadvisibility of 'rolling your own'.

(3) In the same package there is a function gamma.shape to find the
mle of the shape given the mean.  And the package is support for a
book, and this should be obvious from the book's index.

(4) A value of 2750 for a shape parameter is totally implausible. Have
you any idea at all what gamma(shape = 2750) looks like?

MASS's function gives

b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L ,1699L,
4545L)

fit<- glm(b ~ 1, family=Gamma()
coef(fit)

  (Intercept)
0.0003391958

gamma.shape(fit)


Alpha: 7.786827
SE:    2.411487

At that point you have mles of 1/mean and shape.  The mle of the rate
is then

0.0003391958*7.786827

[1] 0.002641259

It's so much easier to use fitdistr:

fitdistr(b/1000, "gamma")

      shape       rate
   7.7871427   2.6413668
  (2.4115976) (0.8449421)

And as a check of plausibility:
dx<- seq(0, 6000, len = 100)
plot(dx, dgamma(dx, shape=7.8, rate=2.6e-3), type = "l")
rug(b)


On Sun, 7 Aug 2011, ted.hard...@wlandres.net wrote:

On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:

On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:

Hi Alex,

Try

require(MASS)

Loading required package: MASS

b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,

+ 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
+ 1699L, 4545L)

fitdistr(b, 'gamma')

        shape           rate
    6.4528939045   0.0021887943
   (0.7722567310) (0.0001854559)


Hi,

I'm getting a weird error here, pasted below. Do you know what
could cause that?

fitdistr(b,"gamma")

Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
2353L, 1966L,  :
    non-finite finite-difference value [2]
In addition: Warning message:
In dgamma(x, shape, scale, log) : NaNs produced


I'm surprised no=-one has suggested going straight down the
maximum-likelihood route. The solution for the 'rate' parameter
is immediate, while the solution for the 'shape' paramater
needs the solution of an equation in one variable imvolving
the digamma function psi(k).

Writing the density of the gamma distribution as

  (1/G(k))*(x^(k-1))*exp(x/r)/(r^k)

where k is the shape parameter and r is the rate, and
G(k) denotes the gamma function Gamma(k), we have for
the estimates

[1]  r = mean(x.i)

[2]  psi(k) = mean(log(x.i))

where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
denotes first derivative).

So you just need to solve [2] for k.

This is well summarised in the Wikipedia article:

http://en.wikipedia.org/wiki/
Gamma_distribution#Maximum_likelihood_estimation

in the Section "Parameter estimation".

Since the digamma function digamma() is in the base package,
all you need to do is submit it to a root-finder, here done
in the naivest possible way.

With your values of

  x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
        2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
        3001L, 2329L, 3944L, 2568L, 1699L, 4545L)

you get

  mean(log(x))
  # [1] 7.92335

which is quite a modest number (though in a somewhat tense
relationship with the digamma function).

A bit of experimentation with plot() will lead you home:

  t<-(50:200);     plot(t,digamma(t))
  t<-(100:2000);   plot(t,digamma(t))
  t<-20*(100:200); plot(t,digamma(t))

from which the root is somewhere near 2750.

  t<-(2700:2800); plot(t,digamma(t))
  t<-(2745:2865); plot(t,digamma(t))
  t<-2755+0.1*(0:100); plot(t,digamma(t))

So:

  mean(log(x)) - digamma(2760)
  # [1] 0.0005452383
  mean(log(x)) - digamma(2769)
  # [1] -0.002710915
  mean(log(x)) - digamma(2765)
  # [1] -0.001265045
  mean(log(x)) - digamma(2763)
  # [1] -0.0005413247
  mean(log(x)) - digamma(2762)
  # [1] -0.0001792682
  mean(log(x)) - digamma(2761)
  # [1] 0.0001829194
  mean(log(x)) - digamma(2761.5)
  # [1] 1.809230e-06
  mean(log(x)) - digamma(2761.75)
  # [1] -8.873357e-05
  mean(log(x)) - digamma(2761.625)
  # [1] -4.34632e-05
  mean(log(x)) - digamma(2761.5625)
  # [1] -2.082724e-05
  mean(log(x)) - digamma(2761.53125)
  # [1] -9.509068e-06
  mean(log(x)) - digamma(2761.515625)
  # [1] -3.849935e-06
  mean(log(x)) - digamma(2761.5078125)
  # [1] -1.020356e-06
  mean(log(x)) - digamma(2761.50390625)
  # [1] 3.944361e-07
  mean(log(x)) - digamma(2761.505859375)
  # [1] -3.129603e-07
  mean(log(x)) - digamma(2761.5048828125)
  # [1] 4.073787e-08

And so it goes ... Anyway, to 7 significant figures,
the answer is that k = 2761.505, and that is "by hand"
using a simple "interval-halving" procedure.

I don't know why fitdistr() threw an error. It may be
that the starting-value for the iterations was to far
off and threw it into some difficult space. The good
starting value above was obtained by a bit of graphical
trial and error. Maybe something similar can be emulated
in R?

Hoping this helps,
Ted.

PS
Oh, and while I'm at it, the estimate of the rate is

  r = mean(x) = 2948.15


--------------------------------------------------------------------
E-Mail: (Ted Harding)<ted.hard...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-11                                       Time: 02:03:29
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gamma distribution parameter estimation

Reply via email to