Den wrote:
Dear R community
Recently, dear Henrique Dallazuanna literally saved me solving one
problem on data transformation which follows:
(n_, _n, j_, k_ signify numbers)
SOURCE DATA:
id cycle1 cycle2 cycle3 … cycle_n
1 c c c c
1 m m m m
1 f f f f
2 m m m NA
2 f f f NA
2 c c c NA
3 a a NA NA
3 c c c NA
3 f f f NA
3 NA NA m NA
...........................................
Q: How to transform source data to:
RESULT DATA:
id cyc1 cyc2 cyc3 … cyc_n
1 cfm cfm cfm cfm
2 cfm cfm cfm
3 acf acf cfm
...........................................
The Henrique's solution is:
aggregate(.~ id, lapply(df, as.character), FUN =
function(x)paste(sort(x), collapse = ''), na.action = na.pass)
Could somebody EXPLAIN HOW IT WORKS?
I mean Henrique saved my investigation indeed.
However, considering the fact, that I am about to perform investigation
of cancer chemotherapy in 500 patients, it would be nice to know what
I am actually doing.
1. All help says about LHS in formulas like '.~id' is that it's
name is "dot notation". And not a single word more. Thus, I have no
clue, what dot in that formula really means.
Well, ?aggregate does (rather gently) point you to the
help page for _formula_ where you will find quite a few
word about the use of '.' in the Details section.
2. help says:
Note that ‘paste()’ coerces ‘NA_character_’, the character missing
value, to ‘"NA"'
And at the same time:
‘na.pass’ returns the object unchanged.
I am happy, that I don't have NAs in mydata. I just don't understand
how it happened.
I don't understand what you're asking.
3. Can't see the real difference between 'FUN = function(x) paste(x)'
and 'FUN = paste'. However, former works perfectly while latter simply
do not.
That's not quite true. You're using paste(sort(x)) and not
just x in Henrique's solution. And that's precisely
the point: when a function is not 'simple', you need to
define it. Henrique is defining it 'on the fly'; you
could also define it separately before the aggregate()
call and then use it like this:
myfun <- function(x) paste(sort(x), collapse='')
aggregate(...., FUN = myfun, ....)
Peter Ehlers
All I can follow from code above is that R breaks data on groups with
same id, then it tear each little 'cycle' piece in separate characters,
then sorts them and put together these characters within same id on each
'cycle'. I miss how R put together all this mess back into nice data
frame of long format. NAs is also a question, as I said before.
Could you please put some light on it if you don't mind to answer those
naive questions.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.