Re: [R] Constrained Regression

Spencer Graves Sun, 31 Oct 2010 19:56:55 -0700


<in line>



On 10/31/2010 6:26 PM, Jim Silverton wrote:

I thought I would 'add' some meat to the problem I sent.  This is all I know
(1) f = a*X1 + (1-a)*X2


How do you know "f = a*X1 + (1-a)*X2"?

Why does this relationship make more sense than, e.g., log(f/(1-f)) =a*X1 + (1-a)*X2?

(2) I know n values of f and X1 which happens to be probabilities


What do you mean that f and X1 are probabilities?  Where do they come from?


Are the the results of binomial experiments like tosses of a biased coin?

(3) I know nothing about X2 except that it also lies in (0,1)
(4) X1 is the probability under the null (fisher's exact test) and X2 is the
alternative. But I have no idea what the alternative to fisher's exact test
can be..

How do you compute X1? Do you compute it from data? If yes, then it'smore like an estimate of a probability rather than a probabilityitself. Ditto for X2.

The preferred method for estimating almost anything whenever feasible isto write a likelihood function, i.e., the probability of data as afunction of unknown parameters, then estimate those parameters tomaximize the likelihood. Even most nonparametric procedures can beexpressed this way for an appropriately chosen probability model.

In most situations, I prefer to eliminate constraints bytransformations, replacing f by ph = log(f/(1-f), X1 and X2 by xi1 =log(X1/(1-X1)) and X2 by xi2 = log(X2/(1-X2)), a by alpha =log(a/(1-a)), then write a relationship between these unconstrainedvariables and try to estimate alpha by maximum likelihood.



Hope this helps.
Spencer

(5) I need to estimate a (which is supposed to be a proportion).
(6) I was thinking about imputing values from (0,1) using  a beta as the
values of X2.

Any help is greatly appreciated.

Jim...

On Sun, Oct 31, 2010 at 1:44 PM, David Winsemius<dwinsem...@comcast.net>wrote:

On Oct 31, 2010, at 12:54 PM, Spencer Graves wrote:

  Have you tried the 'sos' package?
I have, and I am taking this opportunity to load it with my .Rprofile to
make it more accessible. It works very well. Very clean display. I also have
constructed a variant of RSiteSearch that I find more useful than the
current defaults.

rhelpSearch<- function(string,
                  restrict = c("Rhelp10", "Rhelp08", "Rhelp02", "functions"
),
                   matchesPerPage = 100, ...) {
              RSiteSearch(string=string,
              restrict = restrict,
              matchesPerPage = matchesPerPage, ...)}

install.packages('sos') # if not already installed
library(sos)
cr<- ???'constrained regression' # found 149 matches
summary(cr) # in 69 packages
cr # opens a table in a browser listing all 169 matches with links to the
help pages

     However, I agree with Ravi Varadhan:  I'd want to understand the
physical mechanism generating the data.  If each is, for example, a
proportion, then I'd want to use logistic regression, possible after some
approximate logistic transformation of X1 and X2 that prevents logit(X) from
going to +/-Inf.  This is a different model, but it achieves the need to
avoid predictions of Y going outside the range (0, 1).

No argument. I defer to both of your greater experiences in such problems
and your interest in educating us less knowledgeable users. I also need to
amend my suggested strategy in situations where a linear model _might_ be
appropriate, since I think the inclusion of the surrogate variable in the
solve.QP setup is very probably creating confusion. After reconsideration I
think one should keep the two approaches separate. These are two approaches
to the non-intercept versions of the model that yield the same estimate (but
only because the constraints do not get invoked ):

lm(medv~I(age-lstat) +offset(lstat) -1, data=Boston)

Call:
lm(formula = medv ~ I(age - lstat) + offset(lstat) - 1, data = Boston)

Coefficients:
I(age - lstat)
        0.1163

library(MASS)   ## to access the Boston data
  designmat<- model.matrix(medv~age+lstat-1, data=Boston)
  Dmat<-crossprod(designmat, designmat); dvec<- crossprod(designmat,

+  Boston$medv)

   Amat<- cbind(1, diag(NROW(Dmat)));  bvec<- c(1,rep(0,NROW(Dmat)));

meq<- 1

  library(quadprog);
  res<- solve.QP(Dmat, dvec, Amat, bvec, meq)
zapsmall(res$solution)  # zapsmall not really needed in this instance

[1] 0.1163065 0.8836935

--
David.

     Spencer


On 10/31/2010 9:01 AM, David Winsemius wrote:

On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote:

  Hello everyone,

I have 3 variables Y, X1 and X2. Each variables lies between 0 and 1. I
want
to do a constrained regression such that a>0 and (1-a)>0

for the model:

Y = a*X1 + (1-a)*X2


It would not accomplish the constraint that a>  0 but you could
accomplish the other constraint within an lm fit:

X3<- X1-X2
lm(Y ~ X3 + offset(X2) )

Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2)
                            Y ~ intercept + beta1*X1 + (1 -beta1)*X2

... so beta1 is a.

In the case beta<  0 then I suppose a would be assigned 0. This might be
accomplished within an iterative calculation framework by a large
penalization for negative values. In a reply (1) to a question by Carlos
Alzola in 2008 on rhalp, Berwin Turlach offered a solution to a similar
problem ( sum(coef) == 1 AND coef non-negative). Modifying his code to
incorporate the above strategy (and choosing two variables for which
parameter values might be inside the constraint boundaries) we get:

library(MASS)   ## to access the Boston data
  designmat<- model.matrix(medv~I(age-lstat) +offset(lstat), data=Boston)
  Dmat<-crossprod(designmat, designmat); dvec<- crossprod(designmat,
  Boston$medv)
  Amat<- cbind(1, diag(NROW(Dmat)))
  bvec<- c(1,rep(0,NROW(Dmat)))
  meq<- 1
library(quadprog)
  res<- solve.QP(Dmat, dvec, Amat, bvec, meq)

zapsmall(res$solution)

[1] 0.686547 0.313453

Turlach specifically advised against any interpretation of this
particular result which was only contructed to demonstrate the mathematical
mechanics.

I tried the help on the constrained regression in R but I concede that
it
was not helpful.

I must not have that package installed because I got nothing that
appeared to be useful with ??"constrained regression" .


David Winsemius, MD
West Hartford, CT

1) http://finzi.psych.upenn.edu/Rhelp10/2008-March/155990.html

______________________________________________

David Winsemius, MD
West Hartford, CT



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Constrained Regression

Reply via email to