Dear David,
thanks so much, I was able to get it to work for my data! I don't really
understand yet how the function works, but it seems extremely useful.
Thanks again!
Annemarie
David Winsemius wrote:
On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote:
Dear people from the R help list,
I have a question that I can't get my head around to start answering,
that is why I am writing to the list.
I have data in a format like this (tabs might look weird):
John A1 1 0 1
John A2 1 1 1
John A3 1 0 0
Mary A1 1 0 1
Mary A2 0 0 1
Mary A3 1 1 0
Peter A1 1 0 0
Peter A2 0 0 1
Peter A3 1 1 1
Josh A1 1 0 0
Josh A2
Josh A3 0 0 0
I want to convert it into a format where variable rows from a single
subject are placed behind each other, but with the different scores
still matching up (i.e., it needs to be able to cope with missing
data, as for Josh's A2 score).
John A1 1 0 1 A2 1 1 1 A3 1
0 0
Mary A1 1 0 1 A2 0 0 1 A3 1 1 0
Peter A1 1 0 0 A2 0 0 1 A3 1
1 1
Josh A1 1 0 0 A2 A3 0 0 0
Preferably, the row identification would become the header of the new
table, something like this:
A11 A12 A13 A21 A22 A23 A31 A32 A33
John 1 0 1 1 1 1 1 0 0
Mary 1 0 1 0 0 1 1 1 0
Peter 1 0 0 0 0 1 1 1 1
Josh 1 0 0 0 0 0
Probably, this has been addressed before - I just don't know how to
search for the answer with the right search terms.
Any help is appreciated, even just a link to a page where this is
addressed!
There is a reshape function in the stats package that nobody except
Phil Spector seems to understand and then there is the reshape and
reshape2 packages that everybody seems to get. (I don't understand why
the classification variables are on the left-hand-side, though.
Positionally it makes some sense, but logically it does not connect
with how I understand the process.)
require(reshape2)
# entered your data with default names V1 V2 V3 V4 V5
> nam123
V1 V2 V3 V4 V5
1 John A1 1 0 1
2 John A2 1 1 1
3 John A3 1 0 0
4 Mary A1 1 0 1
5 Mary A2 0 0 1
6 Mary A3 1 1 0
7 Peter A1 1 0 0
8 Peter A2 0 0 1
9 Peter A3 1 1 1
10 Josh A1 1 0 0
11 Josh A2 NA NA NA
12 Josh A3 0 0 0
> nams.mlt <- melt(nam123, idvars=c("V1", "V2"))
> str(nams.mlt)
'data.frame': 36 obs. of 4 variables:
$ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 2 ...
$ V2 : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ...
$ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ...
$ value : int 1 1 1 1 0 1 1 0 1 1 ...
> dcast(nams.mlt, V1+V2 ~ variable)
V1 V2 V3 V4 V5
1 John A1 1 0 1
2 John A2 1 1 1
3 John A3 1 0 0
4 Josh A1 1 0 0
5 Josh A2 NA NA NA
6 Josh A3 0 0 0
7 Mary A1 1 0 1
8 Mary A2 0 0 1
9 Mary A3 1 1 0
10 Peter A1 1 0 0
11 Peter A2 0 0 1
12 Peter A3 1 1 1
> dcast(nams.mlt, V1 ~ V2+variable)
V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5
1 John 1 0 1 1 1 1 1 0 0
2 Josh 1 0 0 NA NA NA 0 0 0
3 Mary 1 0 1 0 0 1 1 1 0
4 Peter 1 0 0 0 0 1 1 1 1
You can always change the names of the dataframe if you want, and in
this case it would be a simple sub() operation. Personally I would
substitute "." rather than "".
--
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.