George,
Perhaps the site of the RISQ project (Representativity indicators for
Survey Quality) might be of use: http://www.risq-project.eu/ . They
also provide R-code to calculate their indicators.
HTH,
Jan
Quoting ghe...@mathnmaps.com:
An organization has asked me to comment on the validity of their
recent all-employee survey. Survey responses, by geographic region, compared
with the total number of employees in each region, were as follows:
ByRegion
All.Employees Survey.Respondents
Region_1 735 142
Region_2 500 83
Region_3 897 78
Region_4 717 133
Region_5 167 48
Region_6 309 0
Region_7 806 125
Region_8 627 122
Region_9 858 177
Region_10 851 160
Region_11 336 52
Region_12 1823 312
Region_13 80 9
Region_14 774 121
Region_15 561 24
Region_16 834 134
How well does the survey represent the employee population?
Chi-square test says, not very well:
chisq.test(ByRegion)
Pearson's Chi-squared test
data: ByRegion
X-squared = 163.6869, df = 15, p-value < 2.2e-16
By striking three under-represented regions (3,6, and 15), we get
a more reasonable, although still not convincing, result:
chisq.test(ByRegion[setdiff(1:16,c(3,6,15)),])
Pearson's Chi-squared test
data: ByRegion[setdiff(1:16, c(3, 6, 15)), ]
X-squared = 22.5643, df = 12, p-value = 0.03166
This poses several questions:
1) Looking at a side-by-side barchart (proportion of responses vs.
proportion of employees, per region), the pattern of survey responses
appears, visually, to match fairly well the pattern of employees. Is
this a case where we trust the numbers and not the picture?
2) Part of the problem, ironically, is that there were too many responses
to the survey. If we had only one-tenth the responses, but in the same
proportions by region, the chi-square statistic would look much better,
(though with a warning about possible inaccuracy):
data: data.frame(ByRegion$All.Employees, 0.1 *
(ByRegion$Survey.Respondents))
X-squared = 17.5912, df = 15, p-value = 0.2848
Is there a way of reconciling a large response rate with an unrepresentative
response profile? Or is the bad news that the survey will give very precise
results about a very ill-specified sub-population?
(Of course, I would put in softer terms, like "you need to assess the degree
of homogeneity across different regions" .)
3) Is Chi-squared really the right measure of how representative is the
survey?
<<<<<<< >>>>>>>>>
Thanks for any help you can give - hope these questions make sense -
George H.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.