Re: [R] Removing & generating data by category

Adaikalavan Ramasamy Thu, 29 Oct 2009 04:31:39 -0700

Here is another way based on pasting ids as hinted below:


a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),
                   c("A3","A2","A3","A4","A5")),
                   loc=c("B1","B2","B3","B4","B5"),
                   clm=c(rep(("General"),6),rep("Life",4)))

a$uid <- paste(a$id, ".", a$loc, sep="")

out <- tapply( a$clm, a$uid, paste ) # can also add collapse=","
$A1.B1
[1] "General"

$A2.B2
[1] "General" "Life"

$A3.B1
[1] "General"

$A3.B3
[1] "General" "Life"

$A4.B4
[1] "General" "Life"

$A5.B5
[1] "General" "Life"


Then here are those with single policies.

> out[ which( sapply(out, length) == 1 ) ]
$A1.B1
[1] "General"

$A3.B1
[1] "General"



David Winsemius wrote:

On Oct 28, 2009, at 9:30 PM, Steven Kang wrote:
Dear R users,


Basically, from the following arbitrary data set:

a <-
data
.frame
(id
=
c
(c
("A1
","A2
","A3
","A4
","A5
"),c
("A3
","A2
","A3
","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"),6),rep("Life",4)))
a
   id   loc  clm
1  A1  B1 General
2  A2  B2 General
3  A3  B3 General
4  A4  B4 General
5  A5  B5 General
6  A3  B1 General
7  A2  B2    Life
8  A3  B3    Life
9  A4  B4    Life
10 A5  B5    Life
I desire removing records (highlighted records above) with identicalvalues
in each fields ("id" & "loc") but with different value of "clm" (i.e
according to category)
Take a look at this merge operation on separate rows of "a".
> merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id","loc"), all=T)
   id loc clm.x   clm.y
1 A1  B1  <NA> General
2 A2  B2  Life General
3 A3  B1  <NA> General
4 A3  B3  Life General
5 A4  B4  Life General
6 A5  B5  Life General
Assignment of that object and selection with is.na should complete theprocess.
> a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] ,by=c("id", "loc"), all=T)
 > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ]
   id loc clm.x   clm.y
1 A1  B1  <NA> General
3 A3  B1  <NA> General
Alternate methods might include paste-ing id to loc and removingduplicates.
i.e
categ <- table(a$id,a$clm)
categ
    General Life
 A1       1    0
 A2       1    1
 A3       2    1
 A4       1    1
 A5       1    1

The desired output is

   id   loc  clm
1  A1  B1 General
6  A3  B1 General

Because the data set I am working on is quite big (~ 800,000 x 20)
with majority of the fields values being long strings, loopingturned out to
be very inefficient in comapring individual rows..
Are there any alternative efficient methods in implementing thisproblem?
Steven


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing & generating data by category

Reply via email to