Re: [R] subsets

Peter Ehlers Thu, 20 Jan 2011 05:58:03 -0800

On 2011-01-20 02:05, Taras Zakharko wrote:

Hello Den,


your problem is not as it may seem so Ivan's suggestion is only a partial 
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on 
particular conditions.
Thus, simply looking for "ah" or "idh" as Ivan suggests will yield patients 
which can have either of those but not
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis 
associated with each patient.
I think that its done best with the aggregate function. This function splits 
the data according to some
factor (in our case it will be the patient id) and performs a routine on each 
subset (in our case it will be
a condition test):


ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&&   "ihd" %in% x)
ids<- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x&&   !"ihd" %in% x)
ids<- aggregate(diagnosis ~ id, df, function(x) ! "ah" %in% x&&   "ihd" %in% x)

Now, ids will contain a data frame like:

id      diagnosis
1       TRUE
2       FALSE
3       FALSE
...

which shows which patients have the set of diagnoses you asked for. You can 
then apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which  the 
diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame.

Hope it helps,

Taras


Here's a tidy version using the plyr package:

require(plyr)
df1 <- ddply(df, .(id), summarize,
     has.both = ("ah" %in% diagnosis) & ("ihd" %in% diagnosis),
     has.only.ah = ("ah" %in% diagnosis) & !("ihd" %in% diagnosis),
     has.only.ihd = !("ah" %in% diagnosis) & ("ihd" %in% diagnosis)
)

Further processing on the columns of df1 is straightforward.

Peter Ehlers

On Jan 20, 2011, at 9:53 , Den wrote:

Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like

id      diagnosis
1       ah
2       ah
2       ihd
2       im
3       ah
3       stroke
4       ah
4       ihd
4       angina
5       ihd
..............
Q: How to make three data sets:
        1. Patients with ah and ihd
        2. Patients with ah but no ihd
        3. Patients with  ihd but no ah?

If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsets

Reply via email to