This does not appear to be a legitimate topic for r-help: it is are not a consulting service. Please see the posting guide.
Of course, others may disagree and reply. Wouldn't be the first time I'm wrong. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Mar 8, 2017 at 7:27 AM, <g.maub...@weinwolf.de> wrote: > Hi All, > > today I have a more general question concerning the approach of storing > different values from the analysis of multiple variables. > > My task is to compare distributions in a universe with distributions from > the respondents using a whole bunch of variables. Comparison shall be done > on relative frequencies (proportions). > > I was thinking about the structure I should store the results in and came > up with the following: > > -- cut -- > > library(stringi) > > # Result data frame > # Some sort of tidytidy data set where > # each value is stored as an identity. > # This way all values for all variables could be stored in > # one unique data structure. > # If an additional variable added for the name of the > # research one could also build result data set across > # surveys. > # Values for measure could be "number" for 'raw' values or > # "freq" for frequencies/counts. > # Values for unit could be "n" for 'numbers' and > # "%" for percentages. > d_test <- data.frame( > group = rep(c("Universe", "Respondents"), each = 16), > variable = rep("State", 32), > value = rep(c(11.3, > 12.7, > 3.3, > 5, > 0.6, > 8.1, > 6.2, > 5.8, > 6.4, > 14.5, > 8.3, > 0.3, > 3.8, > 2.5, > 8.1, > 3), 2), > label = rep(c("Baden-Wuerttemberg", > "Bayern", > "Berlin", > "Brandenburg", > "Bremen", > "Hamburg", > "Hessen", > "Mecklenburg-Vorpommern", > "Niedersachsen", > "Nordrhein-Westfalen", > "Rheinland-Pfalz", > "Saarland", > "Sachsen", > "Sachsen-Anhalt", > "Schleswig-Holstein", > "Thueringen"),2), > measure = rep("freq", 32), > unit = rep("%", 32), > stringsAsFactors = FALSE > ) > > # This way the variables can be selected using simple > # value selection from Base R functionality. > data <- d_test[d_test$variable == "State" ,] > > # And plot results for every variable. > ggplot( > data = data, > aes( > x = label, > y = value, > fill = group)) + > geom_bar(stat = "identity", position = "dodge") + > theme(axis.text.x = element_text(angle = 45, hjust = 1)) + > scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1])) > + > scale_x_discrete(name = data$variable[1]) + > scale_y_discrete(name = data$unit[1]) > > -- cut -- > > The reporting / presentation is done in R Markdown. I would load the > result data set once at the beginning and running the comparisons as plots > on each variable named in the results data set under "variable". > > If I follow this approach for my customer relationship survey, do think I > would face drawbacks or run into serious trouble? > > I am interested in your opinion and open for other approaches and > suggestions. > > Kind regards > > Georg > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.