Hello I am recently began to work with R, so I am not so experienced. But anyway I cannot find a clear way to process my dataframe which is a bigger one. It shows similar to this
> name=c("A","B","C","B","C","C","C","B","C") > nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5") > value=c(4,5,9,2,7,6,3,6,7) > table=data.frame(cbind(name,nickname,value)) > table=data.frame(cbind(name,nicknames,value)) > table name nicknames value 1 A A1 4 2 B B1 5 3 C C1 9 4 B B2 2 5 C C2 7 6 C C3 6 7 C C4 3 8 B B3 6 9 C C5 7 So I have to rearrange it in the next way: - the first column should contain just unduplicated data, I did this, it is OK and it will look like 1 A 2 B 3 C - the second column should contain different 'nicknames' which correspond to the single A, B or C name nickname value 1 A A1 2 B B1,B2,B3 3 C C1,C2,C3,C4,C5 -the third one should contain the mean value of the numbers which correspond to the same A, B or C 1 A A1 mean(4) 2 B B1,B2,B3 mean(5,2,6) 3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7) I did this using a loop 'for'. to be clear I created tree dataframes which correspond to each of columns, and finally will combine them > ulist=which(!duplicated(table$name)) # I extract the list of positions in > which I don't have duplications > name1=data.frame(table$name[ulist]) # I extract the list of unique names > nicknames1=data.frame(row.names(1:length(ulist))) # I create a dataframe of > dimension equal to unique list length > value1=data.frame(row.names(1:length(ulist))) # I create a dataframe of > dimension equal to unique list length > for(i in 1:length(ulist)) { position=which(as.character(name1[i,1])==table$name) nicknames1[i,1]=toString(table$nicknames[position]) value1[i,1]=mean(as.numeric(table$value[position])) } > fin=cbind(name1,nicknames1,value1) > colnames(fin)=c("NAME","NICKNAME","VALUE") > fin NAME NICKNAME VALUE 1 A A1 3.000000 2 B B1, B2, B3 3.333333 3 C C1, C2, C3, C4, C5 5.200000 it works successfully. But in general I work with dataframes of high dimensions (tens thousands or more rows). So my loop works too slow (i.e., a dataframe of 20000 rows and 3 columns is processed in about 10 minutes). I intend to integrate it into a function, so it is obvious that time will be even longer. If someone can advise me any possibility to modify which I have done or to the way I can do it, please give me a message. King regards to all guys who develop and maintain R sources for such dummies as me Alex Levitchi [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.