Your example code is asserting that the events occurred at the times in 'obs.time', not before those times.
Surv(event = 1) means uncensored. If you try event = 0, the fitter diverges towards an exact fit, as I said it should. Yes, you will get a good fit to the ECDF, but you are modelling the observation times, not the event times (and asserting that a continuous distribution fits discrete data). I suggest you seek local statistical help: these are conceptual rather than R issues. (BTW, using a .com address suggests you are a COMpany and in the absence of a signature block that will influence how much free help you are offered.) On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote: > Dear Prof Ripley and Dimitris (cc R-list), > > thank you very much for your very insightful responses. > > I've been checking how to use survreg{survival} to fit a left-censored > lognormal distribution, and I was surprised to find that results are exactly > the same as with fitdistr{MASS}. Here is an example with a larger dataset: > > #==================== > # data: > data1 <- data.frame(obs.time=c(30,31.98,35.95,37.2,46.4,50.5,54,56,60,62.95, > 69.4,71.5,74,76,78,79.25,81.1,84.68,90,95.37,100), > reps=c(5,47,80,18,20,32,29,8,29,2,7,8,1,4,3,2,3,3,1,2,1)) > > data2 <- with(data1, data.frame(obs.time=rep(obs.time, reps), > event=rep(1,sum(data1$reps)))) > > # 1. using fitdistr to estimate lognormal parameters > library(MASS) > fit1 <- fitdistr(data2$obs.time, "lognormal"); fit1 > # meanlog sdlog > # 3.80552970 0.29093294 > # (0.01665877) (0.01177953) > > # 2. using survreg for left-censored estimation > library(survival) > fm <- survreg(Surv(obs.time,event,type="left")~1, > data=data2, dist="lognormal"); summary(fm) > fm$coefficients[[1]]; fm$scale > # [1] 3.80553 > # [1] 0.2909329 > # results are the identical !! > > # plotting ecdf and the fitted curve ========= > # parameters > fit1.mean <- fit1$estimate[[1]] > fit1.sd <- fit1$estimate[[2]] > > plot(ecdf(data2$obs.time), verticals=TRUE, do.p=FALSE, lwd=1.5, > xlim=c(0,116), ylab="pb", xlab="time") > curve(plnorm(x, meanlog=fit1.mean, sdlog=fit1.sd), add=TRUE, col='red', > lwd=3) > # the fit looks good to me > > #============= end script ========================= > > I guess this relates to the following sentences from Prof Ripley: > > ML fitting can be done for censored data. However, I don't think > you have a valid description here: it seems you never recorded a time at > which the event had not happened, and the most likely fit is a probability > mass at zero (since this is a perfect explanation for your data). > > (which I don't fully understand) and > > If you have an ECDF, the jumps give you the data so you can just use > fitdistr(). > > I have the ECDF and the number of observations is ~150-300. My final > understanding is therefore that fitdistr can be used to fit distributions by > ML over left-censored data, which is equivalent to fit the distribution > using cumulative probabilities (to go back to my original terminology). > > Is this understanding correct? I would highly appreciate any feedback, > especially if I am misunderstanding the way to estimate the function > parameters for this type of data. > > Thank you very much! > > Ahimsa > > > > > On Sat, Feb 23, 2008 at 2:35 AM, Prof Brian Ripley <[EMAIL PROTECTED]> > wrote: > >> On Sat, 23 Feb 2008, ahimsa campos-arceiz wrote: >> >>> Dear all, >>> >>> I'm trying to estimate the parameters of a lognormal distribution fitted >>> from some data. >>> >>> The tricky thing is that my data represent the time at which I recorded >>> certain events. However, in many cases I don't really know when the >> event >>> happened. I' only know the time at which I recorded it as already >> happened. >> >> So this is a rather extreme form of censoring. >> >>> Therefore I want to fit the lognormal from the cumulative distribution >>> function (cdf) rather than from the probability distribution function >> (pdf). >>> >>> My understanding is that methods based on Maximum Likelihood (e.g. >> fitdistr >>> {MASS}) are based on the pdf. Nonlinear least-squares methods seem to be >>> based on the cdf... however I was unable to use nls{stat} for lognormal. >> >> Not so: ML fitting can be done for censored data. However, I don't think >> you have a valid description here: it seems you never recorded a time at >> which the event had not happened, and the most likely fit is a probability >> mass at zero (since this is a perfect explanation for your data). >> >> To make any progress with censoring, you need to see both positive and >> negative events. If you told us that none of these events happened before >> t=15, it would be possible to fit the model (although you would need far >> more data to get a good fit). >> >> Generally code to handle censoring is in survival analysis: e.g. survreg() >> in package survival. In the terminiology of the latter, all your >> observations are left-censored. >> >>> I found a website that explains how to fit univariate distribution >> functions >>> based on cumulative probabilities, including a lognormal example, in >> Matlab: >>> >> http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/cdffitdemo.html >>> >>> and other programs like TableCurve 2D seem to do this too. >> >> Maybe, but that is a different problem. If you have an ECDF, the jumps >> give you the data so you can just use fitdistr(). (And you will see >> comparing observed and fitted CDFs in MASS, the book.) >> >> >>> There must be a straightforward method in R which I have overlooked. Any >>> suggestion on how can I estimate these parameters in R or helpful >> references >>> are very much appreciated. >>> >>> (not sure if it helps but) here is an example of my type of data: >>> >>> treat.1 <- c(21.67, 21.67, 43.38, 35.50, 32.08, 32.08, 21.67, 21.67, >> 41.33, >>> 41.33, 41.33, 32.08, 21.67, 22.48, 23.25, 30.00, 26.00, 19.37, >> 26.00 >>> , >>> 32.08, 21.67, 26.00, 26.00, 43.38, 26.00, 21.67, 22.48, 35.50, >> 38.30, >>> >>> 32.08) >>> >>> treat.2 <- c(35.92, 12.08, 12.08, 30.00, 33.cy73, 35.92, 12.08, 30.00, >> 56.00, >>> 30.00, 35.92, 33.73, 12.08, 26.00, 54.00, 12.08, 12.08, 35.92, >> 35.92 >>> , >>> 12.08, 33.73, 35.92, 63.20, 30.00, 26.00, 33.73, 23.50, 30.00, >> 35.92 >>> , >>> 30.00) >>> >>> Thank you very much! >>> >>> Ahimsa >>> >>> >>> -- >>> ahimsa campos-arceiz >>> www.camposarceiz.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> -- >> Brian D. Ripley, [EMAIL PROTECTED] >> Professor of Applied Statistics, >> http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/> >> University of Oxford, Tel: +44 1865 272861 (self) >> 1 South Parks Road, +44 1865 272866 (PA) >> Oxford OX1 3TG, UK Fax: +44 1865 272595 >> > > > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.