Rolf, I was able to basically reproduce your problems. Also, when I open the ".xls" file, with Excel, I got an error message "file error: data may have been lost". When I saved the file as .csv and got it into R, I found that the data set only has 502 records, but the original dataset of Andrews and Herzberg (from statlib) has 506 records. May be this could be related to the error about "data being lost".
Of course, I don't know what the real "original" data set is? I am increasingly finding it frustrating to reproduce the reported results in journal articles because the data sets and their sources are sloppily documented. Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner Sent: Tuesday, March 24, 2009 5:50 PM To: R-help Forum Subject: Re: [R] Green and Byar (1980) Prostate Cancer Data set from Andrewsand Herzberg - Data On 25/03/2009, at 10:04 AM, Frank E Harrell Jr wrote: > Ravi Varadhan wrote: >> Hi, >> >> I am looking for a data set containing the information from a >> randomized trial evaluating the effect of DES (diethylsilbestrol) on >> multiple time-to-event endpoints, prostate cancer, CVD, and other >> causes. The original source of this data is Green and Byar (1980). >> This is a popular competing risks problem that has subsequently been >> discussed in a number of statistical papers including Kay (1986). >> >> Does anyone have a digital version of this data set? >> >> This data is also presented in Andrews, D. F. and Herzberg, A. M. >> (1985). Data. Does a digital version of all the data sets in A & >> H exist? >> >> Thanks very much, >> Ravi. > > An R binary dataset is at http://biostat.mc.vanderbilt.edu/Datasets > > Note that there is something strange about the AP variable with a lot > of ties at some value near 1.0. I have never been able to find any > documentation about this problem. If you find any please let me know. Out of idle curiosity I went to have a look at this data set. I had problems. (1) The given URL didn't work for me; when I clicked on it, I got an error 404. But if I went to http://biostat.mc.vanderbilt.edu I found a link to ``Datasets'', and clicking on that got me to some data sets. (2) Scrolling down to ``Byar and Green prostate cancer data'' appeared to get me to the right place. But I couldn't see any signs of any ``R binary files''. The available formats appear to be *.sav (SPSS?), *.sdd (???), and *.xls. (3) I downloaded the prostate.xls file O.K. But when I tried to read it in with the read.xls() function from the gdata package, I got an error to the effect > X <- read.xls("prostate.xls") Converting xls file to csv file... Done. Reading csv file... Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input I was able to ``open'' the prostate.xls file with the version of Excel available on my Mac, save it as a *.csv file, and then read *that* in with read.csv() What am I missing? *Are* there ``R binary'' files lurking about that I am somehow not seeing? Why won't read.xls() work on this data set? cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.