On Wed, 2 Jun 2010, Misha Spisok wrote:

Hello,

I can't figure out why using and not using weights in mlogit yields
identical results.  My motivation is for the case when an
"observation" or "individual" represents a number of individuals.  For
example,

library(mlogit)
library(AER)
data("TravelMode", package = "AER")
TM <- mlogit.data(TravelMode, choice = "choice", shape = "long",
                alt.levels = c("air", "train", "bus", "car"))
myweight = rep(floor(1000*runif(nrow(TravelMode)/4)), each = 4)

summary(mlogit(choice ~ wait + vcost + travel + gcost, data=TM))
summary(mlogit(choice ~ wait + vcost + travel + gcost, weights=income, data=TM))
summary(mlogit(choice ~ wait + vcost + travel + gcost,
weights=myweight, data=TM))

Each gives the same result.

I can't replicate that. For me all three give different results. For example, the first two (which do not contain random elements) are

   alttrain      altbus      altcar        wait       vcost      travel
-0.84413818 -1.44150828 -5.20474275 -0.10364955 -0.08493182 -0.01333220
      gcost
 0.06929537

and

   alttrain      altbus      altcar        wait       vcost      travel
-1.56910793 -1.67020936 -5.44725428 -0.11157800 -0.08866886 -0.01435371
      gcost
 0.08087749

respectively. I'm using the current "mlogit" version from CRAN: 0.1-7.

Am I specifying "weights" incorrectly?

Yes, I think so.

Is there a better way to do what I want to do?  That is, if "myweight"
contains the number of observations represented by an "observation,"
is this the correct approach?

You will get the correct parameter estimates but not the correct inference. Following most of the basic model fitting function (such as lm() or glm()), the weights are _not_ interpreted as case weights. I.e., the function treats
  length(weights > 0)
as the number of observations and not
  sum(weights)

A simple example using lm():

  x <- 1:5
  y <- c(0, 2, 1, 4, 5)
  w <- rep(2, 5)
  xx <- c(x, x)
  yy <- c(y, y)

Then you can fit both models

  fm1 <- lm(y ~ x, weights = w)
  fm2 <- lm(yy ~ xx)

and you get the same coefficients

  all.equal(coef(fm1), coef(fm2))

(which only mentions that the strings 'xx' and 'x' are different.) But fm1 thinks 2 parameters have been estimated from 5 observations while the latter thinks 2 parameters have been estimated from 10 observations. Hence

  df.residual(fm1) / df.residual(fm2)
  vcov(fm2) / vcov(fm1)

Hope that helps,
Z


If so, what am I doing wrong?  If not,
what suggestions are there?

Thank you for your time.

Best,

Misha

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to