Here is another way based on pasting ids as hinted below:

a <- data.frame(id=c(c("A1","A2","A3","A4","A5"),
                   c("A3","A2","A3","A4","A5")),
                   loc=c("B1","B2","B3","B4","B5"),
                   clm=c(rep(("General"),6),rep("Life",4)))

a$uid <- paste(a$id, ".", a$loc, sep="")

out <- tapply( a$clm, a$uid, paste ) # can also add collapse=","
$A1.B1
[1] "General"

$A2.B2
[1] "General" "Life"

$A3.B1
[1] "General"

$A3.B3
[1] "General" "Life"

$A4.B4
[1] "General" "Life"

$A5.B5
[1] "General" "Life"


Then here are those with single policies.

> out[ which( sapply(out, length) == 1 ) ]
$A1.B1
[1] "General"

$A3.B1
[1] "General"



David Winsemius wrote:
On Oct 28, 2009, at 9:30 PM, Steven Kang wrote:

Dear R users,


Basically, from the following arbitrary data set:

a <-
data
.frame
(id
=
c
(c
("A1
","A2
","A3
","A4
","A5
"),c
("A3
","A2
","A3
","A4","A5")),loc=c("B1","B2","B3","B4","B5"),clm=c(rep(("General"), 6),rep("Life",4)))

a
   id   loc  clm
1  A1  B1 General
2  A2  B2 General
3  A3  B3 General
4  A4  B4 General
5  A5  B5 General
6  A3  B1 General
7  A2  B2    Life
8  A3  B3    Life
9  A4  B4    Life
10 A5  B5    Life

I desire removing records (highlighted records above) with identical values
in each fields ("id" & "loc") but with different value of "clm" (i.e
according to category)

Take a look at this merge operation on separate rows of "a".

> merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", "loc"), all=T)
   id loc clm.x   clm.y
1 A1  B1  <NA> General
2 A2  B2  Life General
3 A3  B1  <NA> General
4 A3  B3  Life General
5 A4  B4  Life General
6 A5  B5  Life General

Assignment of that object and selection with is.na should complete the process.

> a2m <- merge( a[a$clm=="Life", ], a[a$clm=="General", ] , by=c("id", "loc"), all=T)

 > a2m[ is.na(a2m$clm.x) | is.na(a2m$clm.y), ]
   id loc clm.x   clm.y
1 A1  B1  <NA> General
3 A3  B1  <NA> General

Alternate methods might include paste-ing id to loc and removing duplicates.


i.e
categ <- table(a$id,a$clm)
categ
    General Life
 A1       1    0
 A2       1    1
 A3       2    1
 A4       1    1
 A5       1    1

The desired output is

   id   loc  clm
1  A1  B1 General
6  A3  B1 General

Because the data set I am working on is quite big (~ 800,000 x 20)
with majority of the fields values being long strings, looping turned out to
be very inefficient in comapring individual rows..

Are there any alternative efficient methods in implementing this problem?
Steven

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to