Hello 
I am recently began to work with R, so I am not so experienced. 
But anyway I cannot find a clear way to process my dataframe which is a bigger 
one. 
It shows similar to this 

> name=c("A","B","C","B","C","C","C","B","C") 
> nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5") 
> value=c(4,5,9,2,7,6,3,6,7) 
> table=data.frame(cbind(name,nickname,value)) 
> table=data.frame(cbind(name,nicknames,value)) 
> table 
name nicknames value 
1 A A1 4 
2 B B1 5 
3 C C1 9 
4 B B2 2 
5 C C2 7 
6 C C3 6 
7 C C4 3 
8 B B3 6 
9 C C5 7 

So I have to rearrange it in the next way: 
- the first column should contain just unduplicated data, I did this, it is OK 
and it will look like 
1 A 
2 B 
3 C 

- the second column should contain different 'nicknames' which correspond to 
the single A, B or C 
name nickname value 
1 A A1 
2 B B1,B2,B3 
3 C C1,C2,C3,C4,C5 

-the third one should contain the mean value of the numbers which correspond to 
the same A, B or C 
1 A A1 mean(4) 
2 B B1,B2,B3 mean(5,2,6) 
3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7) 

I did this using a loop 'for'. 
to be clear I created tree dataframes which correspond to each of columns, and 
finally will combine them 

> ulist=which(!duplicated(table$name)) # I extract the list of positions in 
> which I don't have duplications 
> name1=data.frame(table$name[ulist]) # I extract the list of unique names 
> nicknames1=data.frame(row.names(1:length(ulist))) # I create a dataframe of 
> dimension equal to unique list length 
> value1=data.frame(row.names(1:length(ulist))) # I create a dataframe of 
> dimension equal to unique list length 

> for(i in 1:length(ulist)) { 
position=which(as.character(name1[i,1])==table$name) 
nicknames1[i,1]=toString(table$nicknames[position]) 
value1[i,1]=mean(as.numeric(table$value[position])) 
} 
> fin=cbind(name1,nicknames1,value1) 
> colnames(fin)=c("NAME","NICKNAME","VALUE") 
> fin 
NAME NICKNAME VALUE 
1 A A1 3.000000 
2 B B1, B2, B3 3.333333 
3 C C1, C2, C3, C4, C5 5.200000 

it works successfully. But in general I work with dataframes of high dimensions 
(tens thousands or more rows). 
So my loop works too slow (i.e., a dataframe of 20000 rows and 3 columns is 
processed in about 10 minutes). 
I intend to integrate it into a function, so it is obvious that time will be 
even longer. 

If someone can advise me any possibility to modify which I have done or to the 
way I can do it, please give me a message. 

King regards to all guys who develop and maintain R sources for such dummies as 
me 
Alex Levitchi 



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to