On Mar 22, 2015, at 1:12 PM, Luca Meyer wrote: > Hi Bert, > > Maybe I did not explain myself clearly enough. But let me show you with a > manual example that indeed what I would like to do is feasible. > > The following is also available for download from > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 > > rm(list=ls()) > > This is usual (an extract of) the INPUT file I have: > > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, > 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names = > c(2L, > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > This are the initial marginal distributions > > aggregate(v4~v1*v2,f1,sum) > aggregate(v4~v3,f1,sum) > > First I order the file such that I have nicely listed 6 distinct v1xv2 > combinations. > > f1 <- f1[order(f1$v1,f1$v2),] > > Then I compute (manually) the relative importance of each v1xv2 combination: > > tAA <- > (18.18530+1.42917)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=A & v2=A > tAB <- > (3.43806+1.05786)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=A & v2=B > tAC <- > (0.00273+0.00042)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=A & v2=C > tBA <- > (2.37232+1.13430)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=B & v2=A > tBB <- > (3.01835+0.92872)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=B & v2=B > tBC <- > (0.00000+0.00000)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.00000+0.00000) > # this is for combination v1=B & v2=C > # and just to make sure I have not made mistakes the following should be > equal to 1 > tAA+tAB+tAC+tBA+tBB+tBC > > Next, I know I need to increase v4 any time v3=B and the total increase I > need to have over the whole dataset is 29-27.01676=1.98324. In turn, I need > to dimish v4 any time V3=C by the same amount (4.55047-2.56723=1.98324). > This aspect was perhaps not clear at first. I need to move v4 across v3 > categories, but the totals will always remain unchanged. > > Since I want the data alteration to be proportional to the v1xv2 > combinations I do the following: > > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="B", f1$v4+(tAA*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="C", f1$v4-(tAA*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="B", f1$v4+(tAB*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="C", f1$v4-(tAB*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="B", f1$v4+(tAC*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="C", f1$v4-(tAC*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="B", f1$v4+(tBA*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="C", f1$v4-(tBA*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="B", f1$v4+(tBB*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="C", f1$v4-(tBB*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="B", f1$v4+(tBC*1.98324), > f1$v4) > f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="C", f1$v4-(tBC*1.98324), > f1$v4) >
Seems that this could be done a lot more simply with a lookup matrix and ordinary indexing > lookarr <- array(NA, > dim=c(length(unique(f1$v1)),length(unique(f1$v2)),length(unique(f1$v3)) ) , > dimnames=list( unique(f1$v1), unique(f1$v2), unique(f1$v3) ) ) > lookarr[] <- c(tAA,tAA,tAB,tAB,tAC,tAC,tBA,tBA, tBB, tBB, tBC, tBC) > lookarr[ "A","B","C"] [1] 0.1250369 > lookarr[ with(f1, cbind(v1, v2, v3)) ] [1] 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01 9.978703e-05 [6] 0.000000e+00 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01 [11] 9.978703e-05 0.000000e+00 > f1$v4mod <- f1$v4*lookarr[ with(f1, cbind(v1,v2,v3)) ] > f1 v1 v2 v3 v4 v4mod 2 A A B 18.18530 1.129954e+01 41 A A C 1.42917 1.587582e-01 9 A B B 3.43806 4.896610e-01 48 A B C 1.05786 1.322716e-01 11 A C B 0.00273 2.724186e-07 50 A C C 0.00042 0.000000e+00 158 B A B 2.37232 1.474054e+00 197 B A C 1.13430 1.260028e-01 165 B B B 3.01835 4.298844e-01 204 B B C 0.92872 1.161243e-01 167 B C B 0.00000 0.000000e+00 206 B C C 0.00000 0.000000e+00 -- david. > This are the final marginal distributions: > > aggregate(v4~v1*v2,f1,sum) > aggregate(v4~v3,f1,sum) > > Can this procedure be made programmatic so that I can run it on the > (8x13x13) categories matrix? if so, how would you do it? I have really hard > time to do it with some (semi)automatic procedure. > > Thank you very much indeed once more :) > > Luca > > > 2015-03-22 18:32 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: > >> Nonsense. You are not telling us something or I have failed to >> understand something. >> >> Consider: >> >> v1 = c("a","b") >> v2 = "c("a","a") >> >> It is not possible to change the value of a sum of values >> corresponding to v2="a" without also changing that for v1, which is >> not supposed to change according to my understanding of your >> specification. >> >> So I'm done. >> >> -- Bert >> >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> (650) 467-7374 >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> Clifford Stoll >> >> >> >> >> On Sun, Mar 22, 2015 at 8:28 AM, Luca Meyer <lucam1...@gmail.com> wrote: >>> Sorry forgot to keep the rest of the group in the loop - Luca >>> ---------- Forwarded message ---------- >>> From: Luca Meyer <lucam1...@gmail.com> >>> Date: 2015-03-22 16:27 GMT+01:00 >>> Subject: Re: [R] Joining two datasets - recursive procedure? >>> To: Bert Gunter <gunter.ber...@gene.com> >>> >>> >>> Hi Bert, >>> >>> That is exactly what I am trying to achieve. Please notice that negative >> v4 >>> values are allowed. I have done a similar task in the past manually by >>> recursively alterating v4 distribution across v3 categories within fix >> each >>> v1&v2 combination so I am quite positive it can be achieved but honestly >> I >>> took me forever to do it manually and since this is likely to be an >>> exercise I need to repeat from time to time I wish I could learn how to >> do >>> it programmatically.... >>> >>> Thanks again for any further suggestion you might have, >>> >>> Luca >>> >>> >>> 2015-03-22 16:05 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: >>> >>>> Oh, wait a minute ... >>>> >>>> You still want the marginals for the other columns to be as originally? >>>> >>>> If so, then this is impossible in general as the sum of all the values >>>> must be what they were originally and you cannot therefore choose your >>>> values for V3 arbitrarily. >>>> >>>> Or at least, that seems to be what you are trying to do. >>>> >>>> -- Bert >>>> >>>> Bert Gunter >>>> Genentech Nonclinical Biostatistics >>>> (650) 467-7374 >>>> >>>> "Data is not information. Information is not knowledge. And knowledge >>>> is certainly not wisdom." >>>> Clifford Stoll >>>> >>>> >>>> >>>> >>>> On Sun, Mar 22, 2015 at 7:55 AM, Bert Gunter <bgun...@gene.com> wrote: >>>>> I would have thought that this is straightforward given my previous >>>> email... >>>>> >>>>> Just set z to what you want -- e,g, all B values to 29/number of B's, >>>>> and all C values to 2.567/number of C's (etc. for more categories). >>>>> >>>>> A slick but sort of cheat way to do this programmatically -- in the >>>>> sense that it relies on the implementation of factor() rather than its >>>>> API -- is: >>>>> >>>>> y <- f1$v3 ## to simplify the notation; could be done using with() >>>>> z <- (c(29,2.567)/table(y))[c(y)] >>>>> >>>>> Then proceed to z1 as I previously described >>>>> >>>>> -- Bert >>>>> >>>>> >>>>> Bert Gunter >>>>> Genentech Nonclinical Biostatistics >>>>> (650) 467-7374 >>>>> >>>>> "Data is not information. Information is not knowledge. And knowledge >>>>> is certainly not wisdom." >>>>> Clifford Stoll >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer <lucam1...@gmail.com> >> wrote: >>>>>> Hi Bert, hello R-experts, >>>>>> >>>>>> I am close to a solution but I still need one hint w.r.t. the >> following >>>>>> procedure (available also from >>>>>> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0) >>>>>> >>>>>> rm(list=ls()) >>>>>> >>>>>> # this is (an extract of) the INPUT file I have: >>>>>> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> "B", >>>>>> "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", >> "A", >>>>>> "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", >> "C", >>>>>> "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, >>>> 2.37232, >>>>>> 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", >> "v4"), >>>> class >>>>>> = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, >>>> 167L, >>>>>> 197L, 204L, 206L)) >>>>>> >>>>>> # this is the procedure that Bert suggested (slightly adjusted): >>>>>> z <- rnorm(nrow(f1)) ## or anything you want >>>>>> z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5) >>>>>> aggregate(v4~v1*v2,f1,sum) >>>>>> aggregate(z1~v1*v2,f1,sum) >>>>>> aggregate(v4~v3,f1,sum) >>>>>> aggregate(z1~v3,f1,sum) >>>>>> >>>>>> My question to you is: how can I set z so that I can obtain specific >>>> values >>>>>> for z1-v4 in the v3 aggregation? >>>>>> In other words, how can I configure the procedure so that e.g. B=29 >> and >>>>>> C=2.56723 after running the procedure: >>>>>> aggregate(z1~v3,f1,sum) >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Luca >>>>>> >>>>>> PS: to avoid any doubts you might have about who I am the following >> is >>>> my >>>>>> web page: http://lucameyer.wordpress.com/ >>>>>> >>>>>> >>>>>> 2015-03-21 18:13 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: >>>>>>> >>>>>>> ... or cleaner: >>>>>>> >>>>>>> z1 <- with(f1,v4 + z -ave(z,v1,v2,FUN=mean)) >>>>>>> >>>>>>> >>>>>>> Just for curiosity, was this homework? (in which case I should >>>>>>> probably have not provided you an answer -- that is, assuming that I >>>>>>> HAVE provided an answer). >>>>>>> >>>>>>> Cheers, >>>>>>> Bert >>>>>>> >>>>>>> Bert Gunter >>>>>>> Genentech Nonclinical Biostatistics >>>>>>> (650) 467-7374 >>>>>>> >>>>>>> "Data is not information. Information is not knowledge. And >> knowledge >>>>>>> is certainly not wisdom." >>>>>>> Clifford Stoll >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Mar 21, 2015 at 7:53 AM, Bert Gunter <bgun...@gene.com> >> wrote: >>>>>>>> z <- rnorm(nrow(f1)) ## or anything you want >>>>>>>> z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean)) >>>>>>>> >>>>>>>> >>>>>>>> aggregate(v4~v1,f1,sum) >>>>>>>> aggregate(z1~v1,f1,sum) >>>>>>>> aggregate(v4~v2,f1,sum) >>>>>>>> aggregate(z1~v2,f1,sum) >>>>>>>> aggregate(v4~v3,f1,sum) >>>>>>>> aggregate(z1~v3,f1,sum) >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Bert >>>>>>>> >>>>>>>> Bert Gunter >>>>>>>> Genentech Nonclinical Biostatistics >>>>>>>> (650) 467-7374 >>>>>>>> >>>>>>>> "Data is not information. Information is not knowledge. And >> knowledge >>>>>>>> is certainly not wisdom." >>>>>>>> Clifford Stoll >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer <lucam1...@gmail.com> >>>> wrote: >>>>>>>>> Hi Bert, >>>>>>>>> >>>>>>>>> Thank you for your message. I am looking into ave() and tapply() >> as >>>> you >>>>>>>>> suggested but at the same time I have prepared a example of input >>>> and >>>>>>>>> output >>>>>>>>> files, just in case you or someone else would like to make an >>>> attempt >>>>>>>>> to >>>>>>>>> generate a code that goes from input to output. >>>>>>>>> >>>>>>>>> Please see below or download it from >>>>>>>>> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 >>>>>>>>> >>>>>>>>> # this is (an extract of) the INPUT file I have: >>>>>>>>> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", >> "B", >>>>>>>>> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >>>>>>>>> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >>>>>>>>> "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, >>>>>>>>> 1.42917, >>>>>>>>> 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, >>>>>>>>> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >>>>>>>>> row.names = >>>>>>>>> c(2L, >>>>>>>>> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >>>>>>>>> >>>>>>>>> # this is (an extract of) the OUTPUT file I would like to obtain: >>>>>>>>> f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", >> "B", >>>>>>>>> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >>>>>>>>> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >>>>>>>>> "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, >>>>>>>>> 1.77918, >>>>>>>>> 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, >>>>>>>>> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >>>>>>>>> row.names = >>>>>>>>> c(2L, >>>>>>>>> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >>>>>>>>> >>>>>>>>> # please notice that while the aggregated v4 on v3 has changed … >>>>>>>>> aggregate(f1[,c("v4")],list(f1$v3),sum) >>>>>>>>> aggregate(f2[,c("v4")],list(f2$v3),sum) >>>>>>>>> >>>>>>>>> # … the aggregated v4 over v1xv2 has remained unchanged: >>>>>>>>> aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >>>>>>>>> aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) >>>>>>>>> >>>>>>>>> Thank you very much in advance for your assitance. >>>>>>>>> >>>>>>>>> Luca >>>>>>>>> >>>>>>>>> 2015-03-21 13:18 GMT+01:00 Bert Gunter <gunter.ber...@gene.com>: >>>>>>>>>> >>>>>>>>>> 1. Still not sure what you mean, but maybe look at ?ave and >>>> ?tapply, >>>>>>>>>> for which ave() is a wrapper. >>>>>>>>>> >>>>>>>>>> 2. You still need to heed the rest of Jeff's advice. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Bert >>>>>>>>>> >>>>>>>>>> Bert Gunter >>>>>>>>>> Genentech Nonclinical Biostatistics >>>>>>>>>> (650) 467-7374 >>>>>>>>>> >>>>>>>>>> "Data is not information. Information is not knowledge. And >>>> knowledge >>>>>>>>>> is certainly not wisdom." >>>>>>>>>> Clifford Stoll >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer < >> lucam1...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> Hi Jeff & other R-experts, >>>>>>>>>>> >>>>>>>>>>> Thank you for your note. I have tried myself to solve the >> issue >>>>>>>>>>> without >>>>>>>>>>> success. >>>>>>>>>>> >>>>>>>>>>> Following your suggestion, I am providing a sample of the >>>> dataset I >>>>>>>>>>> am >>>>>>>>>>> using below (also downloadble in plain text from >>>>>>>>>>> >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): >>>>>>>>>>> >>>>>>>>>>> #this is an extract of the overall dataset (n=1200 cases) >>>>>>>>>>> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", >>>> "B", >>>>>>>>>>> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >>>>>>>>>>> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >>>>>>>>>>> "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, >>>>>>>>>>> 3.43806581506388, >>>>>>>>>>> 0.002733567617055, 1.42917483425029, 1.05786640463504, >>>>>>>>>>> 0.000420548864162308, >>>>>>>>>>> 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, >>>>>>>>>>> 0.928725667117666, >>>>>>>>>>> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", >>>>>>>>>>> row.names >>>>>>>>>>> = >>>>>>>>>>> c(2L, >>>>>>>>>>> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >>>>>>>>>>> >>>>>>>>>>> I need to find a automated procedure that allows me to adjust >> v3 >>>>>>>>>>> marginals >>>>>>>>>>> while maintaining v1xv2 marginals unchanged. >>>>>>>>>>> >>>>>>>>>>> That is: modify the v4 values you can find by running: >>>>>>>>>>> >>>>>>>>>>> aggregate(f1[,c("v4")],list(f1$v3),sum) >>>>>>>>>>> >>>>>>>>>>> while maintaining costant the values you can find by running: >>>>>>>>>>> >>>>>>>>>>> aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >>>>>>>>>>> >>>>>>>>>>> Now does it make sense? >>>>>>>>>>> >>>>>>>>>>> Please notice I have tried to build some syntax that tries to >>>> modify >>>>>>>>>>> values >>>>>>>>>>> within each v1xv2 combination by computing sum of v4, row >>>> percentage >>>>>>>>>>> in >>>>>>>>>>> terms of v4, and there is where my effort is blocked. Not >> really >>>>>>>>>>> sure >>>>>>>>>>> how I >>>>>>>>>>> should proceed. Any suggestion? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Luca >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2015-03-19 2:38 GMT+01:00 Jeff Newmiller < >>>> jdnew...@dcn.davis.ca.us>: >>>>>>>>>>> >>>>>>>>>>>> I don't understand your description. The standard practice on >>>> this >>>>>>>>>>>> list >>>>>>>>>>>> is >>>>>>>>>>>> to provide a reproducible R example [1] of the kind of data >> you >>>> are >>>>>>>>>>>> working >>>>>>>>>>>> with (and any code you have tried) to go along with your >>>>>>>>>>>> description. >>>>>>>>>>>> In >>>>>>>>>>>> this case, that would be two dputs of your input data frames >>>> and a >>>>>>>>>>>> dput >>>>>>>>>>>> of >>>>>>>>>>>> an output data frame (generated by hand from your input data >>>>>>>>>>>> frame). >>>>>>>>>>>> (Probably best to not use the full number of input values >> just >>>> to >>>>>>>>>>>> keep >>>>>>>>>>>> the >>>>>>>>>>>> size down.) We could then make an attempt to generate code >> that >>>>>>>>>>>> goes >>>>>>>>>>>> from >>>>>>>>>>>> input to output. >>>>>>>>>>>> >>>>>>>>>>>> Of course, if you post that hard work using HTML then it will >>>> get >>>>>>>>>>>> corrupted (much like the text below from your earlier emails) >>>> and >>>>>>>>>>>> we >>>>>>>>>>>> won't >>>>>>>>>>>> be able to use it. Please learn to post from your email >> software >>>>>>>>>>>> using >>>>>>>>>>>> plain text when corresponding with this mailing list. >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>> >> --------------------------------------------------------------------------- >>>>>>>>>>>> Jeff Newmiller The ..... >>>> ..... Go >>>>>>>>>>>> Live... >>>>>>>>>>>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. >> ##.#. >>>>>>>>>>>> Live >>>>>>>>>>>> Go... >>>>>>>>>>>> Live: OO#.. Dead: >> OO#.. >>>>>>>>>>>> Playing >>>>>>>>>>>> Research Engineer (Solar/Batteries O.O#. >> #.O#. >>>>>>>>>>>> with >>>>>>>>>>>> /Software/Embedded Controllers) .OO#. >> .OO#. >>>>>>>>>>>> rocks...1k >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>> >> --------------------------------------------------------------------------- >>>>>>>>>>>> Sent from my phone. Please excuse my brevity. >>>>>>>>>>>> >>>>>>>>>>>> On March 18, 2015 9:05:37 AM PDT, Luca Meyer < >>>> lucam1...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Thanks for you input Michael, >>>>>>>>>>>>> >>>>>>>>>>>>> The continuous variable I have measures quantities (down to >> the >>>>>>>>>>>>> 3rd >>>>>>>>>>>>> decimal level) so unfortunately are not frequencies. >>>>>>>>>>>>> >>>>>>>>>>>>> Any more specific suggestions on how that could be tackled? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks & kind regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Luca >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> === >>>>>>>>>>>>> >>>>>>>>>>>>> Michael Friendly wrote: >>>>>>>>>>>>> I'm not sure I understand completely what you want to do, >> but >>>>>>>>>>>>> if the data were frequencies, it sounds like task for >> fitting a >>>>>>>>>>>>> loglinear model with the model formula >>>>>>>>>>>>> >>>>>>>>>>>>> ~ V1*V2 + V3 >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/18/2015 2:17 AM, Luca Meyer wrote: >>>>>>>>>>>>>> * Hello, >>>>>>>>>>>>> *>>* I am facing a quite challenging task (at least to me) >> and >>>> I >>>>>>>>>>>>> was >>>>>>>>>>>>> wondering >>>>>>>>>>>>> *>* if someone could advise how R could assist me to speed >> the >>>>>>>>>>>>> task >>>>>>>>>>>>> up. >>>>>>>>>>>>> *>>* I am dealing with a dataset with 3 discrete variables >> and >>>> one >>>>>>>>>>>>> continuous >>>>>>>>>>>>> *>* variable. The discrete variables are: >>>>>>>>>>>>> *>>* V1: 8 modalities >>>>>>>>>>>>> *>* V2: 13 modalities >>>>>>>>>>>>> *>* V3: 13 modalities >>>>>>>>>>>>> *>>* The continuous variable V4 is a decimal number always >>>> greater >>>>>>>>>>>>> than >>>>>>>>>>>>> zero in >>>>>>>>>>>>> *>* the marginals of each of the 3 variables but it is >>>> sometimes >>>>>>>>>>>>> equal >>>>>>>>>>>>> to zero >>>>>>>>>>>>> *>* (and sometimes negative) in the joint tables. >>>>>>>>>>>>> *>>* I have got 2 files: >>>>>>>>>>>>> *>>* => one with distribution of all possible combinations >> of >>>>>>>>>>>>> V1xV2 >>>>>>>>>>>>> (some of >>>>>>>>>>>>> *>* which are zero or neagtive) and >>>>>>>>>>>>> *>* => one with the marginal distribution of V3. >>>>>>>>>>>>> *>>* I am trying to build the long and narrow dataset >> V1xV2xV3 >>>> in >>>>>>>>>>>>> such >>>>>>>>>>>>> a way >>>>>>>>>>>>> *>* that each V1xV2 cell does not get modified and V3 fits >> as >>>>>>>>>>>>> closely >>>>>>>>>>>>> as >>>>>>>>>>>>> *>* possible to its marginal distribution. Does it make >> sense? >>>>>>>>>>>>> *>>* To be even more specific, my 2 input files look like >> the >>>>>>>>>>>>> following. >>>>>>>>>>>>> *>>* FILE 1 >>>>>>>>>>>>> *>* V1,V2,V4 >>>>>>>>>>>>> *>* A, A, 24.251 >>>>>>>>>>>>> *>* A, B, 1.065 >>>>>>>>>>>>> *>* (...) >>>>>>>>>>>>> *>* B, C, 0.294 >>>>>>>>>>>>> *>* B, D, 2.731 >>>>>>>>>>>>> *>* (...) >>>>>>>>>>>>> *>* H, L, 0.345 >>>>>>>>>>>>> *>* H, M, 0.000 >>>>>>>>>>>>> *>>* FILE 2 >>>>>>>>>>>>> *>* V3, V4 >>>>>>>>>>>>> *>* A, 1.575 >>>>>>>>>>>>> *>* B, 4.294 >>>>>>>>>>>>> *>* C, 10.044 >>>>>>>>>>>>> *>* (...) >>>>>>>>>>>>> *>* L, 5.123 >>>>>>>>>>>>> *>* M, 3.334 >>>>>>>>>>>>> *>>* What I need to achieve is a file such as the following >>>>>>>>>>>>> *>>* FILE 3 >>>>>>>>>>>>> *>* V1, V2, V3, V4 >>>>>>>>>>>>> *>* A, A, A, ??? >>>>>>>>>>>>> *>* A, A, B, ??? >>>>>>>>>>>>> *>* (...) >>>>>>>>>>>>> *>* D, D, E, ??? >>>>>>>>>>>>> *>* D, D, F, ??? >>>>>>>>>>>>> *>* (...) >>>>>>>>>>>>> *>* H, M, L, ??? >>>>>>>>>>>>> *>* H, M, M, ??? >>>>>>>>>>>>> *>>* Please notice that FILE 3 need to be such that if I >>>> aggregate >>>>>>>>>>>>> on >>>>>>>>>>>>> V1+V2 I >>>>>>>>>>>>> *>* recover exactly FILE 1 and that if I aggregate on V3 I >> can >>>>>>>>>>>>> recover >>>>>>>>>>>>> a file >>>>>>>>>>>>> *>* as close as possible to FILE 3 (ideally the same file). >>>>>>>>>>>>> *>>* Can anyone suggest how I could do that with R? >>>>>>>>>>>>> *>>* Thank you very much indeed for any assistance you are >>>> able to >>>>>>>>>>>>> provide. >>>>>>>>>>>>> *>>* Kind regards, >>>>>>>>>>>>> *>>* Luca* >>>>>>>>>>>>> >>>>>>>>>>>>> [[alternative HTML version deleted]] David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.