Dear Peter Thank you Lo and behold Now I've got it In code aggregate(.~ id, lapply(df, as.character), FUN = function(x)paste(sort(x), collapse = ''), na.action = na.pass)
there are no contradictions with NAs. na.action = na.pass is applied to aggregate where default is na.omit. And afterwards those NAs are removed by sort command. It is a lot easier for me to deal with data when I know what I am doing. Thank you again for help. Sorry for annoying naive questions. With best regards Denis Kazakiewicz Belarus Няд, 23/01/2011 у 05:38 -0800, P Ehlers піша: > Den wrote: > > Dear Dennis > > Thank you very much for your comprehensive reply and for time you've > > spent dealing with my e-mail. > > Your kindly explanation made things clearer for me. > > After your explanation it looks simple. > > lapply with chosen options takes small part of cycle<n> with same id > > (eg. df[df$id==3,"cycle2"] and makes from it just a bunch of > > characters. > > The only thing I still don't get is why how this code get rid out of > > NAs, but this is rather minor technical issue. Main question for me was > > in formula. You helped me indeed. > > Okay, now I see what you're asking regarding the NAs. > I should have realized it before. Anyway, the answer > is in the function sort(). Have a look at its help > page and note what sort does when 'na.last=NA', the > default. You'll see where the NAs went. > > Peter Ehlers > > > Thank you again > > Have a nice day > > Denis > >>From bending but not broken Belarus > > У Суб, 22/01/2011 у 17:55 -0800, Dennis Murphy піша: > >> Hi: > >> > >> I wouldn't pretend to speak for Henrique, but I'll give it a shot. > >> > >> On Sat, Jan 22, 2011 at 4:44 AM, Den <d.kazakiew...@gmail.com> wrote: > >> Dear R community > >> Recently, dear Henrique Dallazuanna literally saved me solving > >> one > >> problem on data transformation which follows: > >> > >> (n_, _n, j_, k_ signify numbers) > >> > >> SOURCE DATA: > >> id cycle1 cycle2 cycle3 … cycle_n > >> 1 c c c c > >> 1 m m m m > >> 1 f f f f > >> 2 m m m NA > >> 2 f f f NA > >> 2 c c c NA > >> 3 a a NA NA > >> 3 c c c NA > >> 3 f f f NA > >> 3 NA NA m NA > >> ........................................... > >> > >> > >> Q: How to transform source data to: > >> RESULT DATA: > >> id cyc1 cyc2 cyc3 … cyc_n > >> 1 cfm cfm cfm cfm > >> 2 cfm cfm cfm > >> 3 acf acf cfm > >> ........................................... > >> > >> > >> > >> The Henrique's solution is: > >> > >> aggregate(.~ id, lapply(df, as.character), FUN = > >> function(x)paste(sort(x), collapse = ''), na.action = na.pass) > >> > >> The first part, . ~ id, is the formula. It's using every available > >> variable in the input data on the left hand side of the formula except > >> for id, which is the grouping variable. > >> > >> The data object is lapply(df, as.character), which is a list object > >> that translates every element to character. I'm guessing that each > >> element of the list is a character string or list of character > >> strings, but I'm not sure. It looks like the individual characters of > >> each cycle comprise a list component within id. (??) [My guess: the > >> result of lapply() is a list of lists. The top-level list components > >> correspond to the id's, while the second-level components are the > >> cycle variables, whose elements are the characters in each cycle > >> variable for each row with the same id.] > >> > >> The function to be applied to each id is described in FUN. As Peter > >> mentioned, it's an 'anonymous' function, which means it is defined > >> in-line. In this case, a generic input object x has its elements > >> sorted in increasing order and then combines the elements into a > >> single string (the purpose of collapse = ); NA values are skipped > >> over. Thus, if my hypothesis about the structure of the list is > >> correct, the three characters in each cycle/id combination are first > >> sorted and then combined into a single string, which is then output as > >> the result. By the way that Henrique used the formula, the aggregate() > >> function will march through each cycle variable within id and execute > >> the function, and then iterate the process over all id's. > >> > >> > >> > >> Could somebody EXPLAIN HOW IT WORKS? > >> I mean Henrique saved my investigation indeed. > >> However, considering the fact, that I am about to perform > >> investigation > >> of cancer chemotherapy in 500 patients, it would be nice to > >> know what > >> I am actually doing. > >> > >> Henrique's R knowledge is on a different level from most of us, so I > >> understand your question :) > >> > >> > >> 1. All help says about LHS in formulas like '.~id' is that > >> it's > >> name is "dot notation". And not a single word more. Thus, I > >> have no > >> clue, what dot in that formula really means. > >> > >> . is shorthand for 'everything not otherwise specified in the model > >> formula'. In this case, it represents the entire set of cycle > >> variables. > >> > >> > >> 2. help says: > >> Note that ‘paste()’ coerces ‘NA_character_’, the character > >> missing > >> value, to ‘"NA"' > >> And at the same time: > >> ‘na.pass’ returns the object unchanged. > >> I am happy, that I don't have NAs in mydata. I just don't > >> understand > >> how it happened. > >> 3. Can't see the real difference between 'FUN = function(x) > >> paste(x)' > >> and 'FUN = paste'. However, former works perfectly while > >> latter simply > >> do not. > >> > >> > >> All I can follow from code above is that R breaks data on > >> groups with > >> same id, then it tear each little 'cycle' piece in separate > >> characters, > >> then sorts them and put together these characters within same > >> id on each > >> 'cycle'. I miss how R put together all this mess back into > >> nice data > >> frame of long format. NAs is also a question, as I said > >> before. > >> > >> By default, aggregate() will try to return a data frame. For each id, > >> it will output the id and the result of the function applied to each > >> cycle variable, so there should be one row for each id, and n + 1 > >> columns for the n cycle variables + id. > >> > >> Does that help? > >> > >> Cheers, > >> Dennis > >> > >> > >> Could you please put some light on it if you don't mind to > >> answer those > >> naive questions. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible > >> code. > >> > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. -- Den <d.kazakiew...@gmail.com> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.