Dear R community Recently, dear Henrique Dallazuanna literally saved me solving one problem on data transformation which follows:
(n_, _n, j_, k_ signify numbers) SOURCE DATA: id cycle1 cycle2 cycle3 … cycle_n 1 c c c c 1 m m m m 1 f f f f 2 m m m NA 2 f f f NA 2 c c c NA 3 a a NA NA 3 c c c NA 3 f f f NA 3 NA NA m NA ........................................... Q: How to transform source data to: RESULT DATA: id cyc1 cyc2 cyc3 … cyc_n 1 cfm cfm cfm cfm 2 cfm cfm cfm 3 acf acf cfm ........................................... The Henrique's solution is: aggregate(.~ id, lapply(df, as.character), FUN = function(x)paste(sort(x), collapse = ''), na.action = na.pass) Could somebody EXPLAIN HOW IT WORKS? I mean Henrique saved my investigation indeed. However, considering the fact, that I am about to perform investigation of cancer chemotherapy in 500 patients, it would be nice to know what I am actually doing. 1. All help says about LHS in formulas like '.~id' is that it's name is "dot notation". And not a single word more. Thus, I have no clue, what dot in that formula really means. 2. help says: Note that ‘paste()’ coerces ‘NA_character_’, the character missing value, to ‘"NA"' And at the same time: ‘na.pass’ returns the object unchanged. I am happy, that I don't have NAs in mydata. I just don't understand how it happened. 3. Can't see the real difference between 'FUN = function(x) paste(x)' and 'FUN = paste'. However, former works perfectly while latter simply do not. All I can follow from code above is that R breaks data on groups with same id, then it tear each little 'cycle' piece in separate characters, then sorts them and put together these characters within same id on each 'cycle'. I miss how R put together all this mess back into nice data frame of long format. NAs is also a question, as I said before. Could you please put some light on it if you don't mind to answer those naive questions. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.