Thanks so much Greg! I'm going to take all these suggestions -- yours, jim holtman's, Michael Weylandt's, and several others, and spend a couple of days trying them out to see if I can make them work. I'll report back.
I'm going to have to look especially closely at loglin -- not primarily because I think it will provide a good way of implementing my RAS balancing extension -- I do not think RAS balancing can be made equivalent to a log-linear model -- as because it may provide a competing methodology for integrating all my margins. Also, RAS balancing converges to a consistent solution, but not usually to a unique one, It produces better quality results if the initial values are not to far from the final ones. So I might deploy one of my modeling slogans, "Ignorance is independence," and construct a log-linear model as a starting place and a point of comparison. If this works the way I hope it will, it will make both the American Community Survey Summary Files and the ACS public use micro data much more useful. I'm keeping my fingers crossed. Thanks! AndrewH On Wed, Nov 28, 2012 at 1:02 PM, Greg Snow <538...@gmail.com> wrote: > Yes, I meant FAQ 7.21, must have stuttered in typing, but it is always > good to read other FAQs while looking for a specific one. > > I did read your full description, though whether I fully understand or not > is yet to be seen. > > It seems like a lot of what you want to do could be simplified by using > the apply and sweep functions, or possibly by the aaply function from the > plyr package. > > The apply function works on any dimension of arrays, for example a piece > of code like out <- apply(myarray, c(1,5,7), sum) will sum over all the > dimensions except 1, 5, and 7 and will return a 3 dimensional array, then > myarray2 <- sweep(myarray, c(1,5,7), out, FUN="/") will divide each element > of the original array by the appropriate value of out. In this case running > the apply again would give all 1's, but you could divide the out array by > your target margins. If you had 2 lists, the first has the vectors of > margins to sum/sweep and the 2nd has your target margin arrays, then you > could do a for loop passing the current elements of the lists to the > correct place in the function calls. > > Here is some example code (but the forced margins probably don't make any > sense): > > dims <- list( 1:2, c(1,3), 2:3 ) > margins <- list( 10*matrix(1:16, 4), > 20*17/9*matrix(1:8, ncol=2), > 20*17/9*matrix(1:8, ncol=2) ) > > old <- array(0, c(4,4,2)) > new <- HairEyeColor > i <- 0 > while( max(abs(old-new)) > 0.0000001 ) { > i <- i + 1 > cat('Iteration ',i,'\n') > flush.console() > if(i > 100) { > cat('did not converge\n') > break > } > > old <- new > > for(j in seq_along(dims) ) { > new <- sweep(new, dims[[j]], > apply(new, dims[[j]], sum)/margins[[j]], > FUN="/") > } > } > > > You might also want to look at the loglin function, as part of its > computations it starts with a starting matrix/array (which by default is > all 1's) then finds the array that has the same margins as the passed in > table. It probably uses an algorithm similar to what you want to do. If > you can pass the appropriate pieces to loglin then it may compute what you > want for you (and probably much quicker since it uses compiled code). > > > > > On Tue, Nov 27, 2012 at 8:29 PM, andrewH <ahoer...@rprogress.org> wrote: > >> Dear Greg >> >> You mean FAQ 7.21, not 7.22, correct? Though 7.12 also seems relevant. >> Though I would say I was asking about turning a string into an expression >> rather than a variable. At any rate, thanks for the pointer. I sure I >> would >> benefit from rereading the FAQ on a monthly basis, until I actually know >> most of what is in it. >> >> As to your question about my question, Ive wanted to do this exact thing >> several times in different contexts. However, you are quite correct that I >> am struggling with this problem in a particular context. I have of a >> large, >> multi-dimensional object containing count data. Currently this object is >> implemented as a 26 dimensional (and growing) array with two to thirteen >> dimnames per dimension, though I am thinking of switching it to a data >> frame >> with dimensions as factors and dimname-equivilent factor levels. >> >> I need to take a lot of complicated partitions of this object, mainly, >> though not always, summing to the entire object. Most of the partitions >> are >> subsets of -- >> >> OK, now I have to digress to address a terminological uncertainty. Think >> of a 4X4X4 cube. It has three dimensions, and each dimension has four of >> what? Im going to call them levels right now, though I dont think that >> is >> right -- it would be confusing if there were factors in the picture. Also, >> the dimnames do not name the dimensions, but the thing I am calling >> levels, >> which is also confusing. -- >> >> Anyway, most of the partitions consist of two to four dimensions out of >> 24, >> but sometimes with some levels omitted or summed, and occasionally the >> partitions that are much more complicated (to deal with censored data, >> mainly). I have to use each partition multiple times, doing a very >> different >> thing each time (and then repeat the whole set many times) The next 4 >> paragraphs describe what I am actually doing with the partitions, but you >> can skip over them and cut to the chase if you are not so interested. >> >> I am summing over the dimensions in each partition, dividing a table of >> forcing totals for that partition by those sums (element by element), >> and >> then taking the resulting ratios and multiplying each of the terms in the >> original, non-summed object by the corresponding ratio. >> >> This is easiest to understand by analogy to the two-dimensional case. You >> take the row sums and divide them element by element by a vector of >> pre-determined row forcing totals, to get a vector of forcing ratios. >> Then >> you multiply each row by the corresponding forcing ratio, so that the row >> sum will then match the forcing total. Then you do the same thing with the >> columns. Repeat, alternating row and columns, to convergence. Each column >> has a corresponding column forcing total, and each row has a corresponding >> row forcing total. The elements of the matrix have two partitions that we >> use, one into rows, and the other into columns. This is sometimes called >> RAS balancing, or biproportional matrix adjustment. It is an algorithm >> that >> is used a lot to update big matrices in national income accounting and >> input-output analysis. >> >> What I am doing is the same, but I have forcing totals in two to four >> dimensional tables instead of a one dimensional vectors. Each partition >> divides the array into groups of elements that I want to sum to my forcing >> totals. Again, you go around in a circle, doing forcing with each of the >> (currently 18) tables, to convergence. On count data it should always >> converge. >> >> The thing is, I need to keep track of all these partitions, and then >> multiply the forcing totals by the exact same elements of the array as I >> previously summed. I got up to five dimensions, coding by hand, and then >> realized that 1: the amount of work in going from, e.g., 19 dimensions to >> 20 >> was going to very great, and 2. the likelihood that I would get all the >> nesting and partition-matching right was vanishingly small. >> >> So I am looking for a way to encode the partitions that I use, that would >> allow me to use the same encoding to represent both the subsets of the >> array >> to sum over, crunching the array down to a set of totals corresponding to >> my >> forcing totals, and also defining the subsets of the array that should be >> multiplied by each forcing ratio. And I thought, maybe I could do it with >> strings of indexing commands, one per table of forcing totals. But this >> will >> only work If I can sum the array over the subdivisions that the partition >> defines, multiply all the elements in partition subdivisions by the >> corresponding constants, and then assign the results back to the array, or >> to a new array. Hence my question. >> >> Im afraid that this explanation is too long for people to read, but hope >> springs eternal. Id be remarkably pleased and eternally grateful if I >> got >> a solution to the problem of keeping track of partitions that can be used >> in >> the three ways described in the previous paragraph, even if it has nothing >> to do with executing strings. >> >> Warmest regards, >> andrewH >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4651073.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538...@gmail.com > -- J. Andrew Hoerner Director, Sustainable Economics Program Redefining Progress (510) 507-4820 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.