> On Nov 18, 2017, at 1:52 AM, Allaisone 1 <allaiso...@hotmail.com> wrote: > > Although the loop seems to be formulated correctly I wonder why > it gives me these errors : > > -object 'i' not found > - unexpected '}' in "}"
You probably did not copy the entire code offered. But we cannot know since you did not "show your code", not=r did you post complete error messages. Both of these practices are strongly recommended by the Posting Guide. Please read it (again?). -- David. > > > the desired output is expected to be very large as for each dataframe in the > list of dataframes I expect to see maf value for each of the 600 columns! and > this is only for > > for one dataframe in the list .. I have around 150-200 dataframes.. not sure > how R will store these results.. but first I need the analysis to be done > correctly. The final output has to be something like this :- > > >> mafsforeachcolumns(I,II,...600)foreachcombination > > MealsCombinations Cust.ID I II III > IV ...... 600 > 1 33-55 1 0.124 0.10 > 0.65 0.467 > 3 > 5 > > 2 44-66 7 0.134 0.43 > 0.64 0.479 > 4 > 9 > > . > > . > > ~180 dataframes > > > ________________________________ > From: Boris Steipe <boris.ste...@utoronto.ca> > Sent: 18 November 2017 00:35:16 > To: Allaisone 1; R-help > Subject: Re: [R] Complicated analysis for huge databases > > Something like the following? > > AllMAFs <- list() > > for (i in length(SeparatedGroupsofmealsCombs) { > AllMAFs[[i]] <- apply( SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf( > tabulate( x+1) )) > } > > > (untested, of course) > Also the solution is a bit generic since I don't know what the output of > maf() looks like in your case, and I don't understand why you use tabulate > because I would have assumed that's what maf() does - but that's not for me > to worry about :-) > > > > B. > > > >> On Nov 17, 2017, at 7:15 PM, Allaisone 1 <allaiso...@hotmail.com> wrote: >> >> >> Thanks Boris , this was very helpful but I'm struggling with the last part. >> >> 1) I combined the first 2 columns :- >> >> >> library(tidyr) >> SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), >> remove=FALSE) >> SingleMealsCode <- SingleMealsCode[,-2] >> >> 2) I separated this dataframe into different dataframes based on >> "MealsCombination" >> column so R will recognize each meal combination separately : >> >> SeparatedGroupsofmealsCombs <- >> split(SingleMealCode,SingleMealCode$MealsCombinations) >> >> after investigating the structure of "SeparatedGroupsofmealsCombs" , I can >> see >> a list of different databases, each of which represents a different Meal >> combinations which is great. >> >> No, I'm struggling with the last part, how can I run the maf code for all >> dataframes? >> >> when I run this code as before :- >> >> maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1))) >> >> an error message says : dim(X) must have a positive length . I'm not sure >> which length >> I need to specify.. any suggestions to correct this syntax ? >> >> Regards >> Allaisone >> From: Boris Steipe <boris.ste...@utoronto.ca> >> Sent: 17 November 2017 21:12:06 >> To: Allaisone 1 >> Cc: R-help >> Subject: Re: [R] Complicated analysis for huge databases >> >> Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" >> and use split() on these IDs to break up your dataset. Iterate over the list >> of data frames split() returns. >> >> >> B. >> >>> On Nov 17, 2017, at 12:59 PM, Allaisone 1 <allaiso...@hotmail.com> wrote: >>> >>> >>> Hi all .., >>> >>> >>> I have a large dataset of around 600,000 rows and 600 columns. The first >>> col is codes for Meal A, the second columns is codes for Meal B. The third >>> column is customers IDs where each customer had a combination of meals. >>> Each column of the rest columns contains values 0,1,or 2. The dataset is >>> organised in a way so that the first group of customers had similar meals >>> combinations, this is followed by another group of customers with similar >>> meals combinations but different from the first group and so on. The >>> dataset looks like this :- >>> >>> >>>> MyData >>> >>> Meal A Meal B Cust.ID I II III IV >>> ...... 600 >>> >>> 1 33 55 1 0 1 2 >>> 0 >>> >>> 2 33 55 3 1 0 2 >>> 2 >>> >>> 3 33 55 5 2 1 1 >>> 2 >>> >>> 4 44 66 7 0 2 >>> 2 2 >>> >>> 5 44 66 4 1 1 >>> 0 1 >>> >>> 6 44 66 9 2 0 >>> 1 2 >>> >>> . >>> >>> . >>> >>> 600,000 >>> >>> >>> >>> I wanted to find maf() for each column(from 4 to 600) after calculating the >>> frequency of the 3 values (0,1,2) but this should be done group by group >>> (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so on). >>> >>> >>> I can do the analysis for the entire column but not group by group like >>> this : >>> >>> >>> MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1))) >>> >>> How can I modify this code to tell R to do the analysis group by group for >>> each column so I get maf value for 33-55 group of clolumn I, then maf value >>> for group 44-66 in the same column I,then the rest of groups in this column >>> and do the same for the remaining columns. >>> >>> In fact, I'm interested in doing this analysis for only 300 columns but all >>> of the 600 columns. >>> I have another sheet contains names of columns of interest like this : >>> >>>> ColOfinterest >>> >>> Col >>> I >>> IV >>> V >>> . >>> . >>> 300 >>> >>> Any one would help with the best combination of syntax to perform this >>> complex analysis? >>> >>> Regards >>> Allaisone >>> >>> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.