On May 25, 7:46 pm, [EMAIL PROTECTED] wrote: > > I think this problem is related to integer partitioning, but it's not <snip> > > If, by "integer partitioning," you mean the "subset sum > problem" (given a finite set S of integers, does S contain a subset > which sums up to some given integer k?), then you are right. I'm > reasonably sure there's a polynomial reduction from your problem to > the subset sum problem. That would make your problem NP-complete.
Wow, that's helpful. http://en.wikipedia.org/wiki/Subset_sum clarifies what I'm trying to do. Yes, it probably is NP-complete. Fortunately my use case doesn't require that the process finish, just that it produce a lot. I can exclude trivial cases (e.g., f' = f(S) + s_1 - s_1) with some sort of simplifier. > Like I said, because this looks related to the subset sum problem > (though harder, because you're asking for "all" such functions, for > some rigorous definition of "all," as per the previous sentence), I > suspect it's NP-complete. However, there are probably some good > heuristics, such as if s1 has a large intersection with s0, it's > probably a good idea to use s1 in whatever formula you come up with. There are a few real-world constraints which make the problem easier, but only by linear factors, I suspect. More on this below. > Other than that, I don't really have an idea. Can you say what the > application of this algorithm would be? Knowing the use case might > suggest a better approach. This is a problem in statistical estimation. Given n records made up of k variables, define a cell as the point in the cartesian product of v_1 * v_2 * ... * v_k. I want to apply an estimator on the data in each cell. However, many cells are empty, and others are too sparse to support estimation. One solution is to build lots of valid aggregations, called "strata," for which estimates can be made for chunks of cells. A "valid aggregation" is a sequence of *adjacent* cells (see below) whose data are pooled (by whatever mechanism is relevant to the estimator). The cells are "elements" in my initial problem statement, and the strata are the sets. The constraints I mentioned are on the composition of strata: strata can only contain sequences of adjacent cells, where adjacency is defined for each variable. Two cells are adjacent if they are equal on every variable except one, and on that one and by its definition, they are adjacent. Does this make it easier to understand, or more difficult? thanks -- PB. -- http://mail.python.org/mailman/listinfo/python-list