Re: [R] meaning of formula in aggregate function

P Ehlers Sat, 22 Jan 2011 07:37:49 -0800

Den wrote:

Dear R community
Recently, dear Henrique Dallazuanna literally saved me solving one
problem on data transformation which follows:

(n_, _n, j_, k_ signify numbers)

SOURCE DATA:id cycle1 cycle2 cycle3 … cycle_n

1       c       c       c               c
1       m       m       m               m
1       f       f       f               f
2       m       m       m               NA
2       f       f       f               NA
2       c       c       c               NA
3       a       a       NA              NA
3       c       c       c               NA
3       f       f       f               NA
3       NA      NA      m               NA
...........................................


Q: How to transform source data to:
RESULT DATA:
id      cyc1    cyc2    cyc3    …       cyc_n
1       cfm     cfm     cfm             cfm

2 cfm cfm cfm3 acf acf cfm...........................................

The Henrique's solution is:

aggregate(.~ id, lapply(df, as.character), FUN =
function(x)paste(sort(x), collapse = ''), na.action = na.pass)


Could somebody EXPLAIN HOW IT WORKS?
I mean Henrique saved my investigation indeed.
However, considering the fact, that I am about to perform investigation

of cancer chemotherapy in 500 patients, it would be nice to know whatI am actually doing.

1. All help says about LHS in formulas like '.~id' is that it's
name is "dot notation". And not a single word more. Thus, I have no
clue, what dot in that formula really means.

Well, ?aggregate does (rather gently) point you to the
help page for _formula_ where you will find quite a few
word about the use of '.' in the Details section.

2. help says:
 Note that ‘paste()’ coerces ‘NA_character_’, the character missing
value, to ‘"NA"'
And at the same time:
 ‘na.pass’ returns the object unchanged.
I am happy, that I don't have NAs in mydata.  I just don't understand
how it happened.

I don't understand what you're asking.

3. Can't see the real difference between 'FUN = function(x) paste(x)'
and 'FUN = paste'. However, former works perfectly while latter simply
do not.

That's not quite true. You're using paste(sort(x)) and not
just x in Henrique's solution. And that's precisely
the point: when a function is not 'simple', you need to
define it. Henrique is defining it 'on the fly'; you
could also define it separately before the aggregate()
call and then use it like this:

myfun <- function(x) paste(sort(x), collapse='')
aggregate(...., FUN = myfun, ....)

Peter Ehlers


All I can follow from code above is that R breaks data on groups with
same id, then it tear each little 'cycle' piece in separate characters,
then sorts them and put together these characters within same id on each
'cycle'. I miss how R put together all this mess back into nice data

frame of long format. NAs is also a question, as I said before.

Could you please put some light on it if you don't mind to answer those
naive  questions.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] meaning of formula in aggregate function

Reply via email to