On Apr 28, 2012, at 10:46 AM, Daniel Malter wrote:

Hi,

I have a data frame whose first row (not the header) contains the true
column names. The same column name can occur multiple times in the dataset. Columns with equal names are not adjacent, and for each observation only one
of the equally named columns contains the actual data (see the example
below). I am looking for an easy method consolidate these columns into one
column for each unique column name. Say,

x1<-c("x",1,NA,NA)
x2<-c("x",NA,2,NA)
x3<-c("x",NA,NA,3)
y1<-c("y",3,NA,NA)
y2<-c("y",NA,1,NA)
y3<-c("y",NA,NA,2)
d<-data.frame(x1,y1,x2,y2,x3,y3)
d


It would avoid problems with manipulating factors it these were created (or converted to) character columns, choose one of:

d=data.frame(x1,y1,x2,y2,x3,y3, stringsAsFactors=FALSE)

d[]<-lapply(d, as.character)


# d looks like:

   x1   y1   x2   y2   x3   y3
1    x    y    x    y    x    y
2    1    3 <NA> <NA> <NA> <NA>
3 <NA> <NA>    2    1 <NA> <NA>
4 <NA> <NA> <NA> <NA>    3    2

From this, I want to create the table or data frame

x y
1 3
2 1
3 2

na.omit(
    data.frame(
        X=stack(d[-1,grep("x", names(d))]),
        Y=stack(d[-1,grep("y", names(d))]),
        stringsAsFactors=FALSE)[ c(1,3) ])

  X.values Y.values
1        1        3
5        2        1
9        3        2


If it were less regular you might need to merge with the "source" columns that stack generates.


--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to