Re: [R] [r] How to pick colums from a ragged array?

Rui Barradas Tue, 23 Oct 2012 04:24:03 -0700

Hello,

Thinking again, if you just want the first/last in each ID that repeatsthe DATE, the following function does the job. Since there were no suchcases in your data example, I've added 3 rows to the dataset.


ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,910,910,999,1019,1019
,1019,1019)

DATE <- c(20060821,20061207,20080102,20090904,20040205,20040323,20051111
,20060111,20071119,20080107,20080407,20080521,20080711,20041005
,20070905,20020814,20021125,20040429,20040429,20071205,20080227
,20050421,20060130,20060428,20060602,20060816,20061025,20061129
,20070112,20070514,20091105,20091105,20091117,20091119,20091120,20091210
,20091224,20091224,20050503,19870508,19880223,19880330,19880330)

id.d <- cbind(ID, DATE)


getRepeat <- function(x, first = TRUE){
    fun <- if(first) head else tail
    sp <- split(data.frame(x), x[,1])
    first.date <- tapply(x[,2], x[,1], FUN = fun, 1)
    lst <- lapply(seq_along(sp), function(j) sp[[j]][,2] == first.date[j])
    n <- unlist(lapply(lst, sum))
    sp1 <- sp[n > 1]
    i1 <- lst[n > 1]
    lapply(seq_along(sp1), function(j) sp1[[j]][i1[[j]], ])
}

getRepeat(id.d)  # defaults to first = TRUE
getRepeat(id.d, first = FALSE)  # to get the last ones


Hope this helps,

Rui Barradas


Em 23-10-2012 10:59, Rui Barradas escreveu:

Hello,
I'm not sure I understand it well, in the solution below the onlyreturned value is ID == 814 but it's not the first nor the last DATE.
how.many <- ave(id.d[,1], id.d[,1], id.d[,2], FUN = length)
id.d[how.many > 1, ]
See the help page for ?ave if the repetition of id.d[,1] is confusing.The first is the vector to average (to apply FUN to) and the second isone of thw two vectors defining the groups.
Hope this helps,

Rui Barradas
Em 23-10-2012 10:37, Stuart Leask escreveu:
I have a large dataset (~1 million rows) of three variables: ID(patient's name), DATE (of appointment) and DIAGNOSIS (given on thatdate).Patients may have been assigned more than one diagnosis at any oneappointment - leading to two rows, same ID and DATE but differentDIAGNOSIS.
The diagnoses may change between appointments.

I want to subset the data in two ways:

-          define groups of patients by the first diagnosis given

-          define groups of patients by the last diagnosis given.

The problem:
Unfortunately, a small number of patients have been given more thanone diagnosis at their first (or last) appointment. These individualsI need to identify and remove, as it's not possible to say uniquelywhat their first (or last) diagnosis was. So I need to identify andremove these individuals which have pairs of rows with the same IDand (lowest or highest) DATE. The size of the dataset precludes theoption of doing this by eye.
I suspect there is a very elegant way of doing this in R.

This is what I've come up with:


-          Sort by DATE then ID

-          Make a ragged array of DATE by ID

-          Remove IDs that only occur once.
- Subtract the first and second DATEs. Remove IDs for whichthis = zero, as this will only be true for IDs for which theappointment is recorded twice (because there were two diagnosesrecorded on this date).
- (Then do the same to get the 'last appointment'duplicates, by reversing the initial sort by DATE.)
I am stuck at the 'Subtract dates' step: I would like to get the dataout of the ragged array by columns (so e.g. I end up with a matrix ofID, 1st DATE, 2nd DATE). But I can't get the dates out by column fromthe ragged array.
I hope someone can help. My ugly code is below, with some data fortesting.
Stuart


Dr Stuart John Leask DM FRCPsych MB BChir MA
Clinical Senior Lecturer and Honorary Consultant Pychiatrist
Institute of Mental Health, Innovation Park
Triumph Road, Nottingham, Notts. NG7 2TU. UK
Tel. +44 115 82 30419stuart.le...@nottingham.ac.uk<mailto:stuart.le...@nottingham.ac.uk>
Google 'Dr Stuart Leask'


ID <- c(58,58,58,58,167,167,323,323,323,323,323,323,323
,547,794,814,814,814,814,814,814,841,841,841,841,841
,841,841,841,841,910,910,910,910,910,910,999,1019,1019
,1019)

DATE <- c(20060821,20061207,20080102,20090904,20040205,20040323,20051111
,20060111,20071119,20080107,20080407,20080521,20080711,20041005
,20070905,20020814,20021125,20040429,20040429,20071205,20080227
,20050421,20060130,20060428,20060602,20060816,20061025,20061129
,20070112,20070514,20091105,20091117,20091119,20091120,20091210
,20091224,20050503,19870508,19880223,19880330)

id.d <- cbind (ID,DATE )
rag.a <- split ( id.d [ ,2 ], id.d [ ,1]) # createragged array, 1-n DATES for every NAME
# Inelegant attempt to remove IDs that only have one entry:

rag.s <-tapply  (id.d [ ,2], id.d [ ,1], sum) #add up the dates per row
# Since DATE is in 'year mo da', if there's only one date, sum willbe less than 2100000:
rag.t <- rag.s [ rag.s > 21000000 ]
multi.dates <- rownames ( rag.t ) # all theIDs with >1 daterag.am <- rag.a [ multi.dates ] # rag.amonly has IDs with > 1 Date
# But now I'm stuck.
# Each row of the array is rag.am$ID.
# So I can't pick columns of DATEs from the ragged array.
This message and any attachment are intended solely for the addresseeand may contain confidential information. If you have received thismessage in error, please send it back to me, and immediately deleteit. Please do not use, copy or disclose the information containedin this message or in any attachment. Any views or opinionsexpressed by the author of this email do not necessarily reflect theviews of the University of Nottingham.
This message has been checked for viruses but the contents of anattachmentmay still contain software viruses which could damage your computersystem:you are advised to perform your own checks. Email communications withtheUniversity of Nottingham may be monitored as permitted by UKlegislation.
    [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [r] How to pick colums from a ragged array?

Reply via email to