Re: [R] why change days of the week from a factor to an ordered factor?

Bill Mon, 02 Dec 2013 16:35:08 -0800

Thanks to all, still a bit of a mystery. It is for the graphing I guess but
not sure why. I don't think there is too much sense to saying that Tuesday
precedes Wednesday but there could, as some of you suggest, be
circumstances where this is useful.



On Mon, Dec 2, 2013 at 12:00 PM, Bert Gunter <gunter.ber...@gene.com> wrote:

> Did you not see Mark Leeds's post?
>
> The OP apparently did not really mean R's "ordered factors" as
> produced by the R ordered() constructor; rather, he meant "factors
> with levels ordered differently than the default", for which Rich's
> answer was apropos. Mine -- and now yours -- in which we wrongly
> assumed the OP knew what he was saying by "ordered factor", was not.
>
> -- Bert
>
> On Mon, Dec 2, 2013 at 11:51 AM, Robert Baer <rb...@atsu.edu> wrote:
> >
> > On 12/2/2013 9:35 AM, Bert Gunter wrote:
> >
> > Not true, Rich.
> >
> > The point about alphabetical ordering explains why the author likely
> > explicitly set the levels for the factor, though.
> >
> > As to why ordered factors, we may never know, but one possible
> explanation
> > is that at some point he was going to use statistics where he wanted to
> use
> > polynomial contrasts. See
> >
> > options()$contrasts
> >
> > Note that the default contrast type differs for normal factors and
> ordered
> > factors.
> >
> >
> >
> > z <-factor(letters[1:3],lev=letters[3:1])
> > sort(z)
> >
> > [1] c b a
> > Levels: c b a
> >
> > What you say is true only for the **default** sort order.
> >
> > (Although maybe the code author didn't realize this either)
> >
> > -- Bert
> >
> >
> > On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger <r...@temple.edu>
> wrote:
> >
> > If days of the week is not an Ordered Factor, then it will be sorted
> > alphabetically.
> > Fr Mo Sa Su Th Tu We
> >
> > Rich
> >
> > On Mon, Dec 2, 2013 at 6:24 AM, Bill <william...@gmail.com> wrote:
> >
> > I am reading the code below. It acts on a csv file called dodgers.csv
> with
> > the following variables.
> >
> >
> > print(str(dodgers))  # check the structure of the data frame
> >
> > 'data.frame':   81 obs. of  12 variables:
> >  $ month      : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1
> 1
> > 1 ...
> >  $ day        : int  10 11 12 13 14 15 23 24 25 27 ...
> >  $ attend     : int  56000 29729 28328 31601 46549 38359 26376 44014
> 26345
> > 44807 ...
> >  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6
> 7
> > 1 ...
> >  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11
> 11
> > 3 3 3 10 ...
> >  $ temp       : int  67 58 57 54 57 65 60 63 64 66 ...
> >  $ skies      : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> > ...
> >  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
> >  $ cap        : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> >  $ shirt      : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> >  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
> >  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> > NULL
> >
> > I don't understand why the author of the code decided to make the factor
> > days_of_week into an ordered factor. Anyone know why this should be done?
> > Thank you.
> >
> > Here is the code:
> >
> > # Predictive Model for Los Angeles Dodgers Promotion and Attendance
> >
> > library(car)  # special functions for linear regression
> > library(lattice)  # graphics package
> >
> > # read in data and create a data frame called dodgers
> > dodgers <- read.csv("dodgers.csv")
> > print(str(dodgers))  # check the structure of the data frame
> >
> > # define an ordered day-of-week variable
> > # for plots and data summaries
> > dodgers$ordered_day_of_week <- with(data=dodgers,
> >   ifelse ((day_of_week == "Monday"),1,
> >   ifelse ((day_of_week == "Tuesday"),2,
> >   ifelse ((day_of_week == "Wednesday"),3,
> >   ifelse ((day_of_week == "Thursday"),4,
> >   ifelse ((day_of_week == "Friday"),5,
> >   ifelse ((day_of_week == "Saturday"),6,7)))))))
> > dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
> > levels=1:7,
> > labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
> >
> > # exploratory data analysis with standard graphics: attendance by day of
> > week
> > with(data=dodgers,plot(ordered_day_of_week, attend/1000,
> > xlab = "Day of Week", ylab = "Attendance (thousands)",
> > col = "violet", las = 1))
> >
> > # when do the Dodgers use bobblehead promotions
> > with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
> > Tuesday
> >
> > # define an ordered month variable
> > # for plots and data summaries
> > dodgers$ordered_month <- with(data=dodgers,
> >   ifelse ((month == "APR"),4,
> >   ifelse ((month == "MAY"),5,
> >   ifelse ((month == "JUN"),6,
> >   ifelse ((month == "JUL"),7,
> >   ifelse ((month == "AUG"),8,
> >   ifelse ((month == "SEP"),9,10)))))))
> > dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
> > labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
> >
> > # exploratory data analysis with standard R graphics: attendance by month
> > with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
> > ylab = "Attendance (thousands)", col = "light blue", las = 1))
> >
> > # exploratory data analysis displaying many variables
> > # looking at attendance and conditioning on day/night
> > # the skies and whether or not fireworks are displayed
> > library(lattice) # used for plotting
> > # let us prepare a graphical summary of the dodgers data
> > group.labels <- c("No Fireworks","Fireworks")
> > group.symbols <- c(21,24)
> > group.colors <- c("black","black")
> > group.fill <- c("black","red")
> > xyplot(attend/1000 ~ temp | skies + day_night,
> >     data = dodgers, groups = fireworks, pch = group.symbols,
> >     aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
> >     layout = c(2, 2), type = c("p","g"),
> >     strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
> >     xlab = "Temperature (Degrees Fahrenheit)",
> >     ylab = "Attendance (thousands)",
> >     key = list(space = "top",
> >         text = list(rev(group.labels),col = rev(group.colors)),
> >         points = list(pch = rev(group.symbols), col = rev(group.colors),
> >         fill = rev(group.fill))))
> >
> > # attendance by opponent and day/night game
> > group.labels <- c("Day","Night")
> > group.symbols <- c(1,20)
> > group.symbols.size <- c(2,2.75)
> > bwplot(opponent ~ attend/1000, data = dodgers, groups = day_night,
> >     xlab = "Attendance (thousands)",
> >     panel = function(x, y, groups, subscripts, ...)
> >        {panel.grid(h = (length(levels(dodgers$opponent)) - 1), v = -1)
> >         panel.stripplot(x, y, groups = groups, subscripts = subscripts,
> >         cex = group.symbols.size, pch = group.symbols, col = "darkblue")
> >        },
> >     key = list(space = "top",
> >     text = list(group.labels,col = "black"),
> >     points = list(pch = group.symbols, cex = group.symbols.size,
> >     col = "darkblue")))
> >
> > # specify a simple model with bobblehead entered last
> > my.model <- {attend ~ ordered_month + ordered_day_of_week + bobblehead}
> >
> > # employ a training-and-test regimen
> > set.seed(1234) # set seed for repeatability of training-and-test split
> > training_test <- c(rep(1,length=trunc((2/3)*nrow(dodgers))),
> > rep(2,length=(nrow(dodgers) - trunc((2/3)*nrow(dodgers)))))
> > dodgers$training_test <- sample(training_test) # random permutation
> > dodgers$training_test <- factor(dodgers$training_test,
> >   levels=c(1,2), labels=c("TRAIN","TEST"))
> > dodgers.train <- subset(dodgers, training_test == "TRAIN")
> > print(str(dodgers.train)) # check training data frame
> > dodgers.test <- subset(dodgers, training_test == "TEST")
> > print(str(dodgers.test)) # check test data frame
> >
> > # fit the model to the training set
> > train.model.fit <- lm(my.model, data = dodgers.train)
> > # obtain predictions from the training set
> > dodgers.train$predict_attend <- predict(train.model.fit)
> >
> > # evaluate the fitted model on the test set
> > dodgers.test$predict_attend <- predict(train.model.fit,
> >   newdata = dodgers.test)
> >
> > # compute the proportion of response variance
> > # accounted for when predicting out-of-sample
> > cat("\n","Proportion of Test Set Variance Accounted for: ",
> > round((with(dodgers.test,cor(attend,predict_attend)^2)),
> >   digits=3),"\n",sep="")
> >
> > # merge the training and test sets for plotting
> > dodgers.plotting.frame <- rbind(dodgers.train,dodgers.test)
> >
> > # generate predictive modeling visual for management
> > group.labels <- c("No Bobbleheads","Bobbleheads")
> > group.symbols <- c(21,24)
> > group.colors <- c("black","black")
> > group.fill <- c("black","red")
> > xyplot(predict_attend/1000 ~ attend/1000 | training_test,
> >        data = dodgers.plotting.frame, groups = bobblehead, cex = 2,
> >        pch = group.symbols, col = group.colors, fill = group.fill,
> >        layout = c(2, 1), xlim = c(20,65), ylim = c(20,65),
> >        aspect=1, type = c("p","g"),
> >        panel=function(x,y, ...)
> >             {panel.xyplot(x,y,...)
> >              panel.segments(25,25,60,60,col="black",cex=2)
> >             },
> >        strip=function(...) strip.default(..., style=1),
> >        xlab = "Actual Attendance (thousands)",
> >        ylab = "Predicted Attendance (thousands)",
> >        key = list(space = "top",
> >               text = list(rev(group.labels),col = rev(group.colors)),
> >               points = list(pch = rev(group.symbols),
> >               col = rev(group.colors),
> >               fill = rev(group.fill))))
> >
> > # use the full data set to obtain an estimate of the increase in
> > # attendance due to bobbleheads, controlling for other factors
> > my.model.fit <- lm(my.model, data = dodgers)  # use all available data
> > print(summary(my.model.fit))
> > # tests statistical significance of the bobblehead promotion
> > # type I anova computes sums of squares for sequential tests
> > print(anova(my.model.fit))
> >
> > cat("\n","Estimated Effect of Bobblehead Promotion on Attendance: ",
> > round(my.model.fit$coefficients[length(my.model.fit$coefficients)],
> > digits = 0),"\n",sep="")
> >
> > # standard graphics provide diagnostic plots
> > plot(my.model.fit)
> >
> > # additional model diagnostics drawn from the car package
> > library(car)
> > residualPlots(my.model.fit)
> > marginalModelPlots(my.model.fit)
> > print(outlierTest(my.model.fit))
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> (650) 467-7374
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] why change days of the week from a factor to an ordered factor?

Reply via email to