Re: [R] Complicated analysis for huge databases

Allaisone 1 Fri, 17 Nov 2017 22:36:03 -0800

Thanks Boris , this was very helpful but I'm struggling with the last part.


1) I combined the first 2 columns :-


library(tidyr)
SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), 
remove=FALSE)
SingleMealsCode <- SingleMealsCode[,-2]

  2) I separated this dataframe into different dataframes based on 
"MealsCombination"
   column so R will recognize each meal combination separately :

SeparatedGroupsofmealsCombs <- 
split(SingleMealCode,SingleMealCode$MealsCombinations)

after investigating the structure of "SeparatedGroupsofmealsCombs" , I can see
a list of different databases, each of which represents a different Meal 
combinations which is great.

No, I'm struggling with the last part, how can I run the maf code for all 
dataframes?

when I run this code as before :-

maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1)))

an error message says : dim(X) must have a positive length . I'm not sure which 
length
I need to specify.. any suggestions to correct this syntax ?

Regards
Allaisone

________________________________
From: Boris Steipe <boris.ste...@utoronto.ca>
Sent: 17 November 2017 21:12:06
To: Allaisone 1
Cc: R-help
Subject: Re: [R] Complicated analysis for huge databases

Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" 
and use split() on these IDs to break up your dataset. Iterate over the list of 
data frames split() returns.


B.

> On Nov 17, 2017, at 12:59 PM, Allaisone 1 <allaiso...@hotmail.com> wrote:
>
>
> Hi all ..,
>
>
> I have a large dataset of around 600,000 rows and 600 columns. The first col 
> is codes for Meal A, the second columns is codes for Meal B. The third column 
> is customers IDs where each customer had a combination of meals. Each column 
> of the rest columns contains values 0,1,or 2. The dataset is organised in a 
> way so that the first group of customers had similar meals combinations, this 
> is followed by another group of customers with similar meals combinations but 
> different from the first group and so on. The dataset looks like this :-
>
>
>> MyData
>
>       Meal A     Meal B     Cust.ID      I            II        III     IV   
> ...... 600
>
> 1    33                 55             1             0           1        2   
>     0
>
> 2    33                 55              3             1          0        2   
>      2
>
> 3    33                 55              5             2          1        1   
>       2
>
> 4    44                 66               7            0          2         2  
>       2
>
> 5   44                  66               4            1          1          0 
>       1
>
> 6   44                  66                9            2          0          
> 1       2
>
> .
>
> .
>
> 600,000
>
>
>
> I wanted to find maf() for each column(from 4 to 600) after calculating the 
> frequency of the 3 values (0,1,2) but this should be done group by group 
> (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so on).
>
>
> I can do the analysis  for the entire column but not group by group like this 
> :
>
>
> MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1)))
>
> How can I modify this code to tell R to do the analysis group by group for 
> each column so I get maf value for 33-55 group of clolumn I, then maf value 
> for group 44-66 in the same column I,then the rest of groups in this column 
> and do the same for the remaining columns.
>
> In fact, I'm interested in doing this analysis for only 300 columns but all 
> of the 600 columns.
> I have another sheet contains names of columns of interest like this :
>
>> ColOfinterest
>
> Col
> I
> IV
> V
> .
> .
> 300
>
> Any one would help with the best combination of syntax to perform this 
> complex analysis?
>
> Regards
> Allaisone
>
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complicated analysis for huge databases

Reply via email to