On Feb 23, 2012, at 3:27 PM, Hed Bar-Nissan wrote:

It's really weighting - it's just that my simplified example was too simplified
Here is my real weight vector:
> sc$W_FSCHWT
[1] 14.8579 61.9528 3.0420 2.9929 5.1239 14.7507 2.7535 2.2693 3.6658 8.6179 2.5926 2.5390 1.7354 2.9767 9.0477 2.6589 3.4040 3.0519
....

You should always convey the necessary complexity of the problem.


And still it should somehow set the case weight.
I could multiply all by 10000 and use maybe your method but it would create such a bloated dataframe

working with numeric only i could probably create weighted means

But something simple as WEIGHTED BY would be nice.

The survey package by Thomas Lumley provides for a wide variety of weighted analyses.

--
David.

tnx
Hed





On Thu, Feb 23, 2012 at 7:43 PM, David Winsemius <dwinsem...@comcast.net > wrote:

On Feb 23, 2012, at 10:49 AM, Hed Bar-Nissan wrote:

The need comes from the PISA data. (http://www.pisa.oecd.org)

In the data there are many cases and each of them carries a numeric
variable that signifies it's weight.
In SPSS the command would be "WEIGHT BY"

In simpler words here is an R sample ( What is get VS what i want to get )


data.recieved <- data.frame(
+ kindergarten_attendance = factor(c(2,1,1,1), labels = c("Yes", "No")),
+ weight=c(10, 1, 1, 1)
+ );
data.recieved;
 kindergarten_attendance weight
1                      No     10
2                     Yes      1
3                     Yes      1
4                     Yes      1



data.weighted <- data.frame(
+ kindergarten_attendance = factor(c(2,2,2,2,2,2,2,2,2,2,1,1,1), labels =
c("Yes", "No")) );

You want "case repetition" not case weighting, which I would use as a term when working on estimation problems:

> ( data.weighted <- unlist(sapply(1:NROW(data.recieved), function(x) rep(data.recieved[x,1], times=data.recieved[x,2] )) ) )
 [1] No  No  No  No  No  No  No  No  No  No  Yes Yes Yes
Levels: Yes No




par(mfrow=c(1,2));
plot(data.recieved$kindergarten_attendance,main="What i get");
plot(data.weighted$kindergarten_attendance,main="What i want to get");

Seems to work with the factor vector, although I didn't replicate dataframe rows, but I guess you could.



tnx in advance
Hed

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to