Hi: Here's one approach using the reshape() function in base R:
# Read in your data: d <- read.table(textConnection(" Candidate.ID Specialty Office Score 110002 C London 47 110002 C East 48 110003 RM West 45 110003 RM Southwest 39 110003 C Southwest 38 110004 H South 42 110006 G East 47 110006 G London 45"), header = TRUE) closeAllConnections() # Create a variable to distinguish positions d$Position <- c(1, 2, 1, 2, 1, 1, 1, 2) reshape(d, idvar = 'Candidate.ID', timevar = 'Position', v.names = c('Specialty', 'Office', 'Score'), direction = 'wide') HTH, Dennis On Sun, Sep 25, 2011 at 6:47 PM, Jen M <jmstatsh...@gmail.com> wrote: > Hi all, > > I'm having a problem restructuring my data the way I'd like it. I have data > that look like this: > > Candidate.ID Specialty Office Score > 110002 C London 47 > 110002 C East 48 > 110003 RM West 45 > 110003 RM Southwest 39 > 110003 C Southwest 38 > 110004 H South 42 > 110006 G East 47 > 110006 G London 45 > > Candidates can apply for the same job specialty in up to 2 offices (never > more). They can apply for different specialties in further centres. I > would like to look at score differences when candidates apply for the same > specialty in two different offices. With the help of the archives I have > tried various stack/unstack and reshape/melt/cast combinations, and I've > managed to get a huge matrix where the columns are all possible combinations > of Specialties & Offices - and there are many. This leaves a very sparse > matrix with mainly null values, and this is not what I want. I'd like the > scores from the two attempts in two columns so I can do scatterplots, > calculate differences by specialty etc. In SPSS I'd use 'restructure' to get > what I want. I'm working to order with specific requests here so I have to > do it this way (as opposed to a modelling approach). > > I would like it restructured to look something like this: > > Candidate.ID Specialty Office.1 Score.1 Office.2 > Score.2 > 110002 C London 47 > East 48 > 110003 RM West 45 > Southwest 39 > 110003 C Southwest 38 > 110004 H South 42 > 110006 G East 47 > London 45 > 110006 G London 45 > > So one row per candidate/specialty combination, with 2 sets of > offices/scores and null values in the second set if they've only applied > once for that specialty. Can anyone help me out with this? Is it possible > using stack or reshape? > > Many thanks for reading, > Jan > > PS Closest I've come to what I need is the sparse matrix produced by this: > > recast(spec.scores, Candidate.ID ~ Specialty + Office, measure.var="Score") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.