Well, 'Alexander Engelhardt' failed to follow the posting guide, and
so did not get a reply from some knowledgeable people.
(1) When using fitdistr you need to roughly scale your data: e.g.
divide by 1000 here.
(2) fitdistr does 'go down the maximum likeilhood route' and it does
find reasonable starting points.
The mean is *not* the mle for the rate, and Mr Harding's gamma pdf is
parametrized by 'scale' not 'rate' (see the R help page). Probably
not its intention, but Harding's posting with its schoolboy errors
exemplifies the inadvisibility of 'rolling your own'.
(3) In the same package there is a function gamma.shape to find the
mle of the shape given the mean. And the package is support for a
book, and this should be obvious from the book's index.
(4) A value of 2750 for a shape parameter is totally implausible.
Have you any idea at all what gamma(shape = 2750) looks like?
MASS's function gives
b <- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L ,1699L,
4545L)
fit <- glm(b ~ 1, family=Gamma()
coef(fit)
(Intercept)
0.0003391958
gamma.shape(fit)
Alpha: 7.786827
SE: 2.411487
At that point you have mles of 1/mean and shape. The mle of the
rate is then
0.0003391958*7.786827
[1] 0.002641259
It's so much easier to use fitdistr:
fitdistr(b/1000, "gamma")
shape rate
7.7871427 2.6413668
(2.4115976) (0.8449421)
And as a check of plausibility:
dx <- seq(0, 6000, len = 100)
plot(dx, dgamma(dx, shape=7.8, rate=2.6e-3), type = "l")
rug(b)
On Sun, 7 Aug 2011, ted.hard...@wlandres.net wrote:
On 06-Aug-11 19:37:49, Alexander Engelhardt wrote:
On 08/06/2011 09:23 PM, Jorge Ivan Velez wrote:
Hi Alex,
Try
require(MASS)
Loading required package: MASS
b<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L, 2013L, 3376L,
+ 3363L, 3567L, 4798L, 2032L, 1699L, 3001L, 2329L, 3944L, 2568L,
+ 1699L, 4545L)
fitdistr(b, 'gamma')
shape rate
6.4528939045 0.0021887943
(0.7722567310) (0.0001854559)
Hi,
I'm getting a weird error here, pasted below. Do you know what
could cause that?
> fitdistr(b,"gamma")
Error in optim(x = c(2039L, 957L, 2088L, 5966L, 2307L, 2044L,
2353L, 1966L, :
non-finite finite-difference value [2]
In addition: Warning message:
In dgamma(x, shape, scale, log) : NaNs produced
I'm surprised no=-one has suggested going straight down the
maximum-likelihood route. The solution for the 'rate' parameter
is immediate, while the solution for the 'shape' paramater
needs the solution of an equation in one variable imvolving
the digamma function psi(k).
Writing the density of the gamma distribution as
(1/G(k))*(x^(k-1))*exp(x/r)/(r^k)
where k is the shape parameter and r is the rate, and
G(k) denotes the gamma function Gamma(k), we have for
the estimates
[1] r = mean(x.i)
[2] psi(k) = mean(log(x.i))
where {x.1,x.2,...,x.n} is the sample, and psi(k) denotes
the digamma function (G'(k))/G(k) = (log(G(k))' (the "'"
denotes first derivative).
So you just need to solve [2] for k.
This is well summarised in the Wikipedia article:
http://en.wikipedia.org/wiki/
Gamma_distribution#Maximum_likelihood_estimation
in the Section "Parameter estimation".
Since the digamma function digamma() is in the base package,
all you need to do is submit it to a root-finder, here done
in the naivest possible way.
With your values of
x<- c(2039L, 2088L, 5966L, 2353L, 1966L, 2312L, 3305L,
2013L, 3376L, 3363L, 3567L, 4798L, 2032L, 1699L,
3001L, 2329L, 3944L, 2568L, 1699L, 4545L)
you get
mean(log(x))
# [1] 7.92335
which is quite a modest number (though in a somewhat tense
relationship with the digamma function).
A bit of experimentation with plot() will lead you home:
t<-(50:200); plot(t,digamma(t))
t<-(100:2000); plot(t,digamma(t))
t<-20*(100:200); plot(t,digamma(t))
from which the root is somewhere near 2750.
t<-(2700:2800); plot(t,digamma(t))
t<-(2745:2865); plot(t,digamma(t))
t<-2755+0.1*(0:100); plot(t,digamma(t))
So:
mean(log(x)) - digamma(2760)
# [1] 0.0005452383
mean(log(x)) - digamma(2769)
# [1] -0.002710915
mean(log(x)) - digamma(2765)
# [1] -0.001265045
mean(log(x)) - digamma(2763)
# [1] -0.0005413247
mean(log(x)) - digamma(2762)
# [1] -0.0001792682
mean(log(x)) - digamma(2761)
# [1] 0.0001829194
mean(log(x)) - digamma(2761.5)
# [1] 1.809230e-06
mean(log(x)) - digamma(2761.75)
# [1] -8.873357e-05
mean(log(x)) - digamma(2761.625)
# [1] -4.34632e-05
mean(log(x)) - digamma(2761.5625)
# [1] -2.082724e-05
mean(log(x)) - digamma(2761.53125)
# [1] -9.509068e-06
mean(log(x)) - digamma(2761.515625)
# [1] -3.849935e-06
mean(log(x)) - digamma(2761.5078125)
# [1] -1.020356e-06
mean(log(x)) - digamma(2761.50390625)
# [1] 3.944361e-07
mean(log(x)) - digamma(2761.505859375)
# [1] -3.129603e-07
mean(log(x)) - digamma(2761.5048828125)
# [1] 4.073787e-08
And so it goes ... Anyway, to 7 significant figures,
the answer is that k = 2761.505, and that is "by hand"
using a simple "interval-halving" procedure.
I don't know why fitdistr() threw an error. It may be
that the starting-value for the iterations was to far
off and threw it into some difficult space. The good
starting value above was obtained by a bit of graphical
trial and error. Maybe something similar can be emulated
in R?
Hoping this helps,
Ted.
PS
Oh, and while I'm at it, the estimate of the rate is
r = mean(x) = 2948.15
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-11 Time: 02:03:29
------------------------------ XFMail ------------------------------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Brian D. Ripley, rip...@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.