Did you not see Mark Leeds's post? The OP apparently did not really mean R's "ordered factors" as produced by the R ordered() constructor; rather, he meant "factors with levels ordered differently than the default", for which Rich's answer was apropos. Mine -- and now yours -- in which we wrongly assumed the OP knew what he was saying by "ordered factor", was not.
-- Bert On Mon, Dec 2, 2013 at 11:51 AM, Robert Baer <rb...@atsu.edu> wrote: > > On 12/2/2013 9:35 AM, Bert Gunter wrote: > > Not true, Rich. > > The point about alphabetical ordering explains why the author likely > explicitly set the levels for the factor, though. > > As to why ordered factors, we may never know, but one possible explanation > is that at some point he was going to use statistics where he wanted to use > polynomial contrasts. See > > options()$contrasts > > Note that the default contrast type differs for normal factors and ordered > factors. > > > > z <-factor(letters[1:3],lev=letters[3:1]) > sort(z) > > [1] c b a > Levels: c b a > > What you say is true only for the **default** sort order. > > (Although maybe the code author didn't realize this either) > > -- Bert > > > On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger <r...@temple.edu> wrote: > > If days of the week is not an Ordered Factor, then it will be sorted > alphabetically. > Fr Mo Sa Su Th Tu We > > Rich > > On Mon, Dec 2, 2013 at 6:24 AM, Bill <william...@gmail.com> wrote: > > I am reading the code below. It acts on a csv file called dodgers.csv with > the following variables. > > > print(str(dodgers)) # check the structure of the data frame > > 'data.frame': 81 obs. of 12 variables: > $ month : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1 > 1 ... > $ day : int 10 11 12 13 14 15 23 24 25 27 ... > $ attend : int 56000 29729 28328 31601 46549 38359 26376 44014 26345 > 44807 ... > $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7 > 1 ... > $ opponent : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11 > 3 3 3 10 ... > $ temp : int 67 58 57 54 57 65 60 63 64 66 ... > $ skies : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1 > ... > $ day_night : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ... > $ cap : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ... > $ shirt : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ... > $ fireworks : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ... > $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ... > NULL > > I don't understand why the author of the code decided to make the factor > days_of_week into an ordered factor. Anyone know why this should be done? > Thank you. > > Here is the code: > > # Predictive Model for Los Angeles Dodgers Promotion and Attendance > > library(car) # special functions for linear regression > library(lattice) # graphics package > > # read in data and create a data frame called dodgers > dodgers <- read.csv("dodgers.csv") > print(str(dodgers)) # check the structure of the data frame > > # define an ordered day-of-week variable > # for plots and data summaries > dodgers$ordered_day_of_week <- with(data=dodgers, > ifelse ((day_of_week == "Monday"),1, > ifelse ((day_of_week == "Tuesday"),2, > ifelse ((day_of_week == "Wednesday"),3, > ifelse ((day_of_week == "Thursday"),4, > ifelse ((day_of_week == "Friday"),5, > ifelse ((day_of_week == "Saturday"),6,7))))))) > dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week, > levels=1:7, > labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun")) > > # exploratory data analysis with standard graphics: attendance by day of > week > with(data=dodgers,plot(ordered_day_of_week, attend/1000, > xlab = "Day of Week", ylab = "Attendance (thousands)", > col = "violet", las = 1)) > > # when do the Dodgers use bobblehead promotions > with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on > Tuesday > > # define an ordered month variable > # for plots and data summaries > dodgers$ordered_month <- with(data=dodgers, > ifelse ((month == "APR"),4, > ifelse ((month == "MAY"),5, > ifelse ((month == "JUN"),6, > ifelse ((month == "JUL"),7, > ifelse ((month == "AUG"),8, > ifelse ((month == "SEP"),9,10))))))) > dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10, > labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct")) > > # exploratory data analysis with standard R graphics: attendance by month > with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month", > ylab = "Attendance (thousands)", col = "light blue", las = 1)) > > # exploratory data analysis displaying many variables > # looking at attendance and conditioning on day/night > # the skies and whether or not fireworks are displayed > library(lattice) # used for plotting > # let us prepare a graphical summary of the dodgers data > group.labels <- c("No Fireworks","Fireworks") > group.symbols <- c(21,24) > group.colors <- c("black","black") > group.fill <- c("black","red") > xyplot(attend/1000 ~ temp | skies + day_night, > data = dodgers, groups = fireworks, pch = group.symbols, > aspect = 1, cex = 1.5, col = group.colors, fill = group.fill, > layout = c(2, 2), type = c("p","g"), > strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1), > xlab = "Temperature (Degrees Fahrenheit)", > ylab = "Attendance (thousands)", > key = list(space = "top", > text = list(rev(group.labels),col = rev(group.colors)), > points = list(pch = rev(group.symbols), col = rev(group.colors), > fill = rev(group.fill)))) > > # attendance by opponent and day/night game > group.labels <- c("Day","Night") > group.symbols <- c(1,20) > group.symbols.size <- c(2,2.75) > bwplot(opponent ~ attend/1000, data = dodgers, groups = day_night, > xlab = "Attendance (thousands)", > panel = function(x, y, groups, subscripts, ...) > {panel.grid(h = (length(levels(dodgers$opponent)) - 1), v = -1) > panel.stripplot(x, y, groups = groups, subscripts = subscripts, > cex = group.symbols.size, pch = group.symbols, col = "darkblue") > }, > key = list(space = "top", > text = list(group.labels,col = "black"), > points = list(pch = group.symbols, cex = group.symbols.size, > col = "darkblue"))) > > # specify a simple model with bobblehead entered last > my.model <- {attend ~ ordered_month + ordered_day_of_week + bobblehead} > > # employ a training-and-test regimen > set.seed(1234) # set seed for repeatability of training-and-test split > training_test <- c(rep(1,length=trunc((2/3)*nrow(dodgers))), > rep(2,length=(nrow(dodgers) - trunc((2/3)*nrow(dodgers))))) > dodgers$training_test <- sample(training_test) # random permutation > dodgers$training_test <- factor(dodgers$training_test, > levels=c(1,2), labels=c("TRAIN","TEST")) > dodgers.train <- subset(dodgers, training_test == "TRAIN") > print(str(dodgers.train)) # check training data frame > dodgers.test <- subset(dodgers, training_test == "TEST") > print(str(dodgers.test)) # check test data frame > > # fit the model to the training set > train.model.fit <- lm(my.model, data = dodgers.train) > # obtain predictions from the training set > dodgers.train$predict_attend <- predict(train.model.fit) > > # evaluate the fitted model on the test set > dodgers.test$predict_attend <- predict(train.model.fit, > newdata = dodgers.test) > > # compute the proportion of response variance > # accounted for when predicting out-of-sample > cat("\n","Proportion of Test Set Variance Accounted for: ", > round((with(dodgers.test,cor(attend,predict_attend)^2)), > digits=3),"\n",sep="") > > # merge the training and test sets for plotting > dodgers.plotting.frame <- rbind(dodgers.train,dodgers.test) > > # generate predictive modeling visual for management > group.labels <- c("No Bobbleheads","Bobbleheads") > group.symbols <- c(21,24) > group.colors <- c("black","black") > group.fill <- c("black","red") > xyplot(predict_attend/1000 ~ attend/1000 | training_test, > data = dodgers.plotting.frame, groups = bobblehead, cex = 2, > pch = group.symbols, col = group.colors, fill = group.fill, > layout = c(2, 1), xlim = c(20,65), ylim = c(20,65), > aspect=1, type = c("p","g"), > panel=function(x,y, ...) > {panel.xyplot(x,y,...) > panel.segments(25,25,60,60,col="black",cex=2) > }, > strip=function(...) strip.default(..., style=1), > xlab = "Actual Attendance (thousands)", > ylab = "Predicted Attendance (thousands)", > key = list(space = "top", > text = list(rev(group.labels),col = rev(group.colors)), > points = list(pch = rev(group.symbols), > col = rev(group.colors), > fill = rev(group.fill)))) > > # use the full data set to obtain an estimate of the increase in > # attendance due to bobbleheads, controlling for other factors > my.model.fit <- lm(my.model, data = dodgers) # use all available data > print(summary(my.model.fit)) > # tests statistical significance of the bobblehead promotion > # type I anova computes sums of squares for sequential tests > print(anova(my.model.fit)) > > cat("\n","Estimated Effect of Bobblehead Promotion on Attendance: ", > round(my.model.fit$coefficients[length(my.model.fit$coefficients)], > digits = 0),"\n",sep="") > > # standard graphics provide diagnostic plots > plot(my.model.fit) > > # additional model diagnostics drawn from the car package > library(car) > residualPlots(my.model.fit) > marginalModelPlots(my.model.fit) > print(outlierTest(my.model.fit)) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.