Re: [R] XLConnect readWorksheet comma decimal sign

2013-12-02 Thread Knut Krueger

Am 29.11.2013 20:39, schrieb David Winsemius:

Thats impossible, we are used to hit the comma

I don't know what that means.

it is common here, that the decimal sign is commy
All computer in the cip-pools are using the "comma" ( an I think 99.9% 
of all other computers here)

Can you imagine what would happen after  changing  this to dot?
Or in the other way, try to get the people in your country to use the 
,comma as separator. It would cause a big jumble.


Until you show a reproducible example, we will not be able to offer 
further advice: 
That*s the problem ... I am still trying to find out  what happened. It 
was definitely wrong in two cases

I was sure that I found the reason when starting this tread...

Knut

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confidence interval, multiple imputation

2013-12-02 Thread Mª Teresa Martinez Soriano


Hi to
everyone, I have a big data set where rows are observations and columns are
variables. It contains a lot of missing values. I have used multiple imputation
with library mice and I get an “exact” prediction of each missing value. Now, I
would like to know the error I can commit or the confidence interval.

How can I
get this?

This is
part of my code

library(mice)

mod1<-mice(dat,
method=c("","",rep("pmm",6)))

ro<-round(cor(dat,
use = "pair"), 3)

 

predictor<-quickpred(dat)#
esta matriz predictora se construye según las correlaciones



mod1<-mice(dat,method=c("","",rep("pmm",6)),
pred=predictor)

imputados<-complete(mod1,'long')

x.imp=split(imputados,
imputados$.imp)

acumula=x.imp[[1]][,-c(1,2)]

   for(j
in 2:length(x.imp))

   {
acumula=acumula+x.imp[[j]][,-c(1,2)]}

med.imp=acumula/5

 

 

Thanks in
advance 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bill
I am reading the code below. It acts on a csv file called dodgers.csv with
the following variables.


> print(str(dodgers))  # check the structure of the data frame
'data.frame':   81 obs. of  12 variables:
 $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
1 ...
 $ day: int  10 11 12 13 14 15 23 24 25 27 ...
 $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
44807 ...
 $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
1 ...
 $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
3 3 3 10 ...
 $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
 $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
...
 $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
 $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
 $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
NULL
>

I don't understand why the author of the code decided to make the factor
days_of_week into an ordered factor. Anyone know why this should be done?
Thank you.

Here is the code:

# Predictive Model for Los Angeles Dodgers Promotion and Attendance

library(car)  # special functions for linear regression
library(lattice)  # graphics package

# read in data and create a data frame called dodgers
dodgers <- read.csv("dodgers.csv")
print(str(dodgers))  # check the structure of the data frame

# define an ordered day-of-week variable
# for plots and data summaries
dodgers$ordered_day_of_week <- with(data=dodgers,
  ifelse ((day_of_week == "Monday"),1,
  ifelse ((day_of_week == "Tuesday"),2,
  ifelse ((day_of_week == "Wednesday"),3,
  ifelse ((day_of_week == "Thursday"),4,
  ifelse ((day_of_week == "Friday"),5,
  ifelse ((day_of_week == "Saturday"),6,7)))
dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
levels=1:7,
labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))

# exploratory data analysis with standard graphics: attendance by day of
week
with(data=dodgers,plot(ordered_day_of_week, attend/1000,
xlab = "Day of Week", ylab = "Attendance (thousands)",
col = "violet", las = 1))

# when do the Dodgers use bobblehead promotions
with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
Tuesday

# define an ordered month variable
# for plots and data summaries
dodgers$ordered_month <- with(data=dodgers,
  ifelse ((month == "APR"),4,
  ifelse ((month == "MAY"),5,
  ifelse ((month == "JUN"),6,
  ifelse ((month == "JUL"),7,
  ifelse ((month == "AUG"),8,
  ifelse ((month == "SEP"),9,10)))
dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))

# exploratory data analysis with standard R graphics: attendance by month
with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
ylab = "Attendance (thousands)", col = "light blue", las = 1))

# exploratory data analysis displaying many variables
# looking at attendance and conditioning on day/night
# the skies and whether or not fireworks are displayed
library(lattice) # used for plotting
# let us prepare a graphical summary of the dodgers data
group.labels <- c("No Fireworks","Fireworks")
group.symbols <- c(21,24)
group.colors <- c("black","black")
group.fill <- c("black","red")
xyplot(attend/1000 ~ temp | skies + day_night,
data = dodgers, groups = fireworks, pch = group.symbols,
aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
layout = c(2, 2), type = c("p","g"),
strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
xlab = "Temperature (Degrees Fahrenheit)",
ylab = "Attendance (thousands)",
key = list(space = "top",
text = list(rev(group.labels),col = rev(group.colors)),
points = list(pch = rev(group.symbols), col = rev(group.colors),
fill = rev(group.fill

# attendance by opponent and day/night game
group.labels <- c("Day","Night")
group.symbols <- c(1,20)
group.symbols.size <- c(2,2.75)
bwplot(opponent ~ attend/1000, data = dodgers, groups = day_night,
xlab = "Attendance (thousands)",
panel = function(x, y, groups, subscripts, ...)
   {panel.grid(h = (length(levels(dodgers$opponent)) - 1), v = -1)
panel.stripplot(x, y, groups = groups, subscripts = subscripts,
cex = group.symbols.size, pch = group.symbols, col = "darkblue")
   },
key = list(space = "top",
text = list(group.labels,col = "black"),
points = list(pch = group.symbols, cex = group.symbols.size,
col = "darkblue")))

# specify a simple model with bobblehead entered last
my.model <- {attend ~ ordered_month + ordered_day_of_week + bobblehead}

# employ a training-and-test regimen
set.seed(1234) # set seed for repeatability of training-and-test split
training_test <- c(rep(1,length=tr

[R] interpretation of MDS plot in random forest

2013-12-02 Thread Massimo Bressan

Given this general example:

set.seed(1)

data(iris)

iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE)

#varImpPlot(iris.rf)

#varUsed(iris.rf)

MDSplot(iris.rf, iris$Species)

I’ve been reading the documentation about random forest (at best of my - 
poor - knowledge) but I’m in trouble with the correct interpretation of 
the MDS plot and I hope someone can give me some clues


What is intended for “the scaling coordinates of the proximity matrix”?


I think to understand that the objective is here to present the distance 
among species in a parsimonious and visual way (of lower dimensionality)


Is therefore a parallelism to what are intended the principal components 
in a classical PCA?


Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the 
proximity matrix?


If that is correct, how would you find the eigenvalues for that 
eigenvectors? And what are the eigenvalues repreenting?



What are saying these two dimensions in the plot about the different 
iris species? Their relative distance in terms of proximity within the 
space DIM1 and DIM2?


How to choose for the k parameter (number of dimensions for the scaling 
coordinates)?


And finally how would you explain the plot in simple terms?

Thank you for any feedback
Best regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generate multiple probability distributions

2013-12-02 Thread Michael Friendly
I want to generate a collection of probability distributions in a data 
frame, with

varying parameters.  There must be some simpler way than what I have below
(avoiding rbind and cbind), but I can't quite see it.

x <- seq(0,12)
bin.df <- as.data.frame(
rbind( cbind(x, prob=dbinom(x,12,1/6), p=1/6),
   cbind(x, prob=dbinom(x,12,1/3), p=1/3),
   cbind(x, prob=dbinom(x,12,1/2), p=1/2),
   cbind(x, prob=dbinom(x,12,2/3), p=2/3)
  ))
bin.df$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2", "2/3"))
str(bin.df)

> str(bin.df)
'data.frame':   52 obs. of  3 variables:
 $ x   : num  0 1 2 3 4 5 6 7 8 9 ...
 $ prob: num  0.1122 0.2692 0.2961 0.1974 0.0888 ...
 $ p   : Factor w/ 4 levels "1/6","1/3","1/2",..: 1 1 1 1 1 1 1 1 1 1 ...
>

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate multiple probability distributions

2013-12-02 Thread Duncan Murdoch

On 02/12/2013 9:47 AM, Michael Friendly wrote:

I want to generate a collection of probability distributions in a data
frame, with
varying parameters.  There must be some simpler way than what I have below
(avoiding rbind and cbind), but I can't quite see it.

x <- seq(0,12)
bin.df <- as.data.frame(
  rbind( cbind(x, prob=dbinom(x,12,1/6), p=1/6),
 cbind(x, prob=dbinom(x,12,1/3), p=1/3),
 cbind(x, prob=dbinom(x,12,1/2), p=1/2),
 cbind(x, prob=dbinom(x,12,2/3), p=2/3)
))
bin.df$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2", "2/3"))
str(bin.df)

  > str(bin.df)
'data.frame':   52 obs. of  3 variables:
   $ x   : num  0 1 2 3 4 5 6 7 8 9 ...
   $ prob: num  0.1122 0.2692 0.2961 0.1974 0.0888 ...
   $ p   : Factor w/ 4 levels "1/6","1/3","1/2",..: 1 1 1 1 1 1 1 1 1 1 ...
  >



dbinom can take vector inputs for the parameters, so this would be a bit 
simpler:


x <- seq(0,12)
x <- rep(x, 4)
p <- rep(c(1/6, 1/3, 1/2, 2/3), each=13)
bin.df <- data.frame(x, prob = dbinom(x, 12, p), p)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bert Gunter
"BIll" :

(Sorry -- Doubt that this will be helpful, but I couln't resist)

"I don't understand why the author of the code decided to make the factor
days_of_week into an ordered factor. Anyone know why this should be done?"

A definitive answer would require either psychic abilities or asking
the author/maintainer of the code. I suggest you try the latter.

However, insight might be gained by **you** answering the following
question: What is the difference between ordered and unordered
factors? Note that one might expect some results that change with day
of week to do so in an "orderly" way.  For example, I would imagine
that grocery purchases are at more or less one level on M-TH and at a
higher level on Fri-Sun in the U.S . Ordered factors would be better
at capturing this sort of thing I would think (with fewer df).

Cheers,
Bert

On Mon, Dec 2, 2013 at 3:24 AM, Bill  wrote:
> I am reading the code below. It acts on a csv file called dodgers.csv with
> the following variables.
>
>
>> print(str(dodgers))  # check the structure of the data frame
> 'data.frame':   81 obs. of  12 variables:
>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
> 1 ...
>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
> 44807 ...
>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
> 1 ...
>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
> 3 3 3 10 ...
>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> ...
>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> NULL
>>
>
> I don't understand why the author of the code decided to make the factor
> days_of_week into an ordered factor. Anyone know why this should be done?
> Thank you.
>
> Here is the code:
>
> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>
> library(car)  # special functions for linear regression
> library(lattice)  # graphics package
>
> # read in data and create a data frame called dodgers
> dodgers <- read.csv("dodgers.csv")
> print(str(dodgers))  # check the structure of the data frame
>
> # define an ordered day-of-week variable
> # for plots and data summaries
> dodgers$ordered_day_of_week <- with(data=dodgers,
>   ifelse ((day_of_week == "Monday"),1,
>   ifelse ((day_of_week == "Tuesday"),2,
>   ifelse ((day_of_week == "Wednesday"),3,
>   ifelse ((day_of_week == "Thursday"),4,
>   ifelse ((day_of_week == "Friday"),5,
>   ifelse ((day_of_week == "Saturday"),6,7)))
> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
> levels=1:7,
> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>
> # exploratory data analysis with standard graphics: attendance by day of
> week
> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
> xlab = "Day of Week", ylab = "Attendance (thousands)",
> col = "violet", las = 1))
>
> # when do the Dodgers use bobblehead promotions
> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
> Tuesday
>
> # define an ordered month variable
> # for plots and data summaries
> dodgers$ordered_month <- with(data=dodgers,
>   ifelse ((month == "APR"),4,
>   ifelse ((month == "MAY"),5,
>   ifelse ((month == "JUN"),6,
>   ifelse ((month == "JUL"),7,
>   ifelse ((month == "AUG"),8,
>   ifelse ((month == "SEP"),9,10)))
> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>
> # exploratory data analysis with standard R graphics: attendance by month
> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>
> # exploratory data analysis displaying many variables
> # looking at attendance and conditioning on day/night
> # the skies and whether or not fireworks are displayed
> library(lattice) # used for plotting
> # let us prepare a graphical summary of the dodgers data
> group.labels <- c("No Fireworks","Fireworks")
> group.symbols <- c(21,24)
> group.colors <- c("black","black")
> group.fill <- c("black","red")
> xyplot(attend/1000 ~ temp | skies + day_night,
> data = dodgers, groups = fireworks, pch = group.symbols,
> aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
> layout = c(2, 2), type = c("p","g"),
> strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
> xlab = "Temperature (Degrees Fahrenheit)",
> ylab = "Attendance (thousands)",
> key = list(space = "top",
> text = list(rev(group.labels),col = rev(g

Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Richard M. Heiberger
If days of the week is not an Ordered Factor, then it will be sorted
alphabetically.
Fr Mo Sa Su Th Tu We

Rich

On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
> I am reading the code below. It acts on a csv file called dodgers.csv with
> the following variables.
>
>
>> print(str(dodgers))  # check the structure of the data frame
> 'data.frame':   81 obs. of  12 variables:
>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
> 1 ...
>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
> 44807 ...
>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
> 1 ...
>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
> 3 3 3 10 ...
>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> ...
>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> NULL
>>
>
> I don't understand why the author of the code decided to make the factor
> days_of_week into an ordered factor. Anyone know why this should be done?
> Thank you.
>
> Here is the code:
>
> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>
> library(car)  # special functions for linear regression
> library(lattice)  # graphics package
>
> # read in data and create a data frame called dodgers
> dodgers <- read.csv("dodgers.csv")
> print(str(dodgers))  # check the structure of the data frame
>
> # define an ordered day-of-week variable
> # for plots and data summaries
> dodgers$ordered_day_of_week <- with(data=dodgers,
>   ifelse ((day_of_week == "Monday"),1,
>   ifelse ((day_of_week == "Tuesday"),2,
>   ifelse ((day_of_week == "Wednesday"),3,
>   ifelse ((day_of_week == "Thursday"),4,
>   ifelse ((day_of_week == "Friday"),5,
>   ifelse ((day_of_week == "Saturday"),6,7)))
> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
> levels=1:7,
> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>
> # exploratory data analysis with standard graphics: attendance by day of
> week
> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
> xlab = "Day of Week", ylab = "Attendance (thousands)",
> col = "violet", las = 1))
>
> # when do the Dodgers use bobblehead promotions
> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
> Tuesday
>
> # define an ordered month variable
> # for plots and data summaries
> dodgers$ordered_month <- with(data=dodgers,
>   ifelse ((month == "APR"),4,
>   ifelse ((month == "MAY"),5,
>   ifelse ((month == "JUN"),6,
>   ifelse ((month == "JUL"),7,
>   ifelse ((month == "AUG"),8,
>   ifelse ((month == "SEP"),9,10)))
> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>
> # exploratory data analysis with standard R graphics: attendance by month
> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>
> # exploratory data analysis displaying many variables
> # looking at attendance and conditioning on day/night
> # the skies and whether or not fireworks are displayed
> library(lattice) # used for plotting
> # let us prepare a graphical summary of the dodgers data
> group.labels <- c("No Fireworks","Fireworks")
> group.symbols <- c(21,24)
> group.colors <- c("black","black")
> group.fill <- c("black","red")
> xyplot(attend/1000 ~ temp | skies + day_night,
> data = dodgers, groups = fireworks, pch = group.symbols,
> aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
> layout = c(2, 2), type = c("p","g"),
> strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
> xlab = "Temperature (Degrees Fahrenheit)",
> ylab = "Attendance (thousands)",
> key = list(space = "top",
> text = list(rev(group.labels),col = rev(group.colors)),
> points = list(pch = rev(group.symbols), col = rev(group.colors),
> fill = rev(group.fill
>
> # attendance by opponent and day/night game
> group.labels <- c("Day","Night")
> group.symbols <- c(1,20)
> group.symbols.size <- c(2,2.75)
> bwplot(opponent ~ attend/1000, data = dodgers, groups = day_night,
> xlab = "Attendance (thousands)",
> panel = function(x, y, groups, subscripts, ...)
>{panel.grid(h = (length(levels(dodgers$opponent)) - 1), v = -1)
> panel.stripplot(x, y, groups = groups, subscripts = subscripts,
> cex = group.symbols.size, pch = group.symbols, col = "darkblue")
>},
> key = list(space = "top",
> text = list(group.labels,col =

Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bert Gunter
Not true, Rich.

> z <-factor(letters[1:3],lev=letters[3:1])
> sort(z)
[1] c b a
Levels: c b a

What you say is true only for the **default** sort order.

(Although maybe the code author didn't realize this either)

-- Bert


On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger  wrote:
> If days of the week is not an Ordered Factor, then it will be sorted
> alphabetically.
> Fr Mo Sa Su Th Tu We
>
> Rich
>
> On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
>> I am reading the code below. It acts on a csv file called dodgers.csv with
>> the following variables.
>>
>>
>>> print(str(dodgers))  # check the structure of the data frame
>> 'data.frame':   81 obs. of  12 variables:
>>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
>> 1 ...
>>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
>> 44807 ...
>>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
>> 1 ...
>>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
>> 3 3 3 10 ...
>>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
>> ...
>>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>> NULL
>>>
>>
>> I don't understand why the author of the code decided to make the factor
>> days_of_week into an ordered factor. Anyone know why this should be done?
>> Thank you.
>>
>> Here is the code:
>>
>> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>>
>> library(car)  # special functions for linear regression
>> library(lattice)  # graphics package
>>
>> # read in data and create a data frame called dodgers
>> dodgers <- read.csv("dodgers.csv")
>> print(str(dodgers))  # check the structure of the data frame
>>
>> # define an ordered day-of-week variable
>> # for plots and data summaries
>> dodgers$ordered_day_of_week <- with(data=dodgers,
>>   ifelse ((day_of_week == "Monday"),1,
>>   ifelse ((day_of_week == "Tuesday"),2,
>>   ifelse ((day_of_week == "Wednesday"),3,
>>   ifelse ((day_of_week == "Thursday"),4,
>>   ifelse ((day_of_week == "Friday"),5,
>>   ifelse ((day_of_week == "Saturday"),6,7)))
>> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
>> levels=1:7,
>> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>>
>> # exploratory data analysis with standard graphics: attendance by day of
>> week
>> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
>> xlab = "Day of Week", ylab = "Attendance (thousands)",
>> col = "violet", las = 1))
>>
>> # when do the Dodgers use bobblehead promotions
>> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
>> Tuesday
>>
>> # define an ordered month variable
>> # for plots and data summaries
>> dodgers$ordered_month <- with(data=dodgers,
>>   ifelse ((month == "APR"),4,
>>   ifelse ((month == "MAY"),5,
>>   ifelse ((month == "JUN"),6,
>>   ifelse ((month == "JUL"),7,
>>   ifelse ((month == "AUG"),8,
>>   ifelse ((month == "SEP"),9,10)))
>> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
>> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>>
>> # exploratory data analysis with standard R graphics: attendance by month
>> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
>> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>>
>> # exploratory data analysis displaying many variables
>> # looking at attendance and conditioning on day/night
>> # the skies and whether or not fireworks are displayed
>> library(lattice) # used for plotting
>> # let us prepare a graphical summary of the dodgers data
>> group.labels <- c("No Fireworks","Fireworks")
>> group.symbols <- c(21,24)
>> group.colors <- c("black","black")
>> group.fill <- c("black","red")
>> xyplot(attend/1000 ~ temp | skies + day_night,
>> data = dodgers, groups = fireworks, pch = group.symbols,
>> aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
>> layout = c(2, 2), type = c("p","g"),
>> strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
>> xlab = "Temperature (Degrees Fahrenheit)",
>> ylab = "Attendance (thousands)",
>> key = list(space = "top",
>> text = list(rev(group.labels),col = rev(group.colors)),
>> points = list(pch = rev(group.symbols), col = rev(group.colors),
>> fill = rev(group.fill
>>
>> # attendance by opponent and day/night game
>> group.labels <- c("Day","Night")
>> group.symbols <- c(1,20)
>> group.symbols.size <- c(2,2.75)
>> bwplot(opponent ~ attend/1000, data = dodgers, groups = d

Re: [R] generate multiple probability distributions

2013-12-02 Thread David Carlson
You can use recycling to simplify things:

> set.seed(42)
> x <- seq(0,12)
> bin.df <- as.data.frame(
+ rbind( cbind(x, prob=dbinom(x,12,1/6), p=1/6),
+cbind(x, prob=dbinom(x,12,1/3), p=1/3),
+cbind(x, prob=dbinom(x,12,1/2), p=1/2),
+cbind(x, prob=dbinom(x,12,2/3), p=2/3)
+  ))
> bin.df$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2",
"2/3"))
> str(bin.df)
'data.frame':   52 obs. of  3 variables:
 $ x   : num  0 1 2 3 4 5 6 7 8 9 ...
 $ prob: num  0.1122 0.2692 0.2961 0.1974 0.0888 ...
 $ p   : Factor w/ 4 levels "1/6","1/3","1/2",..: 1 1 1 1 1 1 1
1 1 1 ...
> 
> bin.df.2 <- data.frame( x, 
+ prob = c(dbinom(x,12,1/6), dbinom(x,12,1/3),
+dbinom(x,12,1/2), dbinom(x,12,2/3)),
+ p = rep(c(1/6, 1/3, 1/2, 2/3), each=length(x))
+ )
> bin.df.2$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2",
"2/3"))
> str(bin.df.2)
'data.frame':   52 obs. of  3 variables:
 $ x   : int  0 1 2 3 4 5 6 7 8 9 ...
 $ prob: num  0.1122 0.2692 0.2961 0.1974 0.0888 ...
 $ p   : Factor w/ 4 levels "1/6","1/3","1/2",..: 1 1 1 1 1 1 1
1 1 1 ...
> all.equal(bin.df, bin.df.2)
[1] TRUE

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352




-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Michael
Friendly
Sent: Monday, December 2, 2013 8:48 AM
To: R-help
Subject: [R] generate multiple probability distributions

I want to generate a collection of probability distributions in
a data 
frame, with
varying parameters.  There must be some simpler way than what I
have below
(avoiding rbind and cbind), but I can't quite see it.

x <- seq(0,12)
bin.df <- as.data.frame(
 rbind( cbind(x, prob=dbinom(x,12,1/6), p=1/6),
cbind(x, prob=dbinom(x,12,1/3), p=1/3),
cbind(x, prob=dbinom(x,12,1/2), p=1/2),
cbind(x, prob=dbinom(x,12,2/3), p=2/3)
   ))
bin.df$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2",
"2/3"))
str(bin.df)

 > str(bin.df)
'data.frame':   52 obs. of  3 variables:
  $ x   : num  0 1 2 3 4 5 6 7 8 9 ...
  $ prob: num  0.1122 0.2692 0.2961 0.1974 0.0888 ...
  $ p   : Factor w/ 4 levels "1/6","1/3","1/2",..: 1 1 1 1 1 1 1
1 1 1 ...
 >

-- 
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416
736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ordered factor question

2013-12-02 Thread Mark Leeds
Hi: I asked Bert privately and he recommended posting what I asked/said to
him to the list.

My comment/question was that I looked at the code and didn't actually
see an ordered factor being created. So my guess is that there is
a confusion with the use of the term "ordered". I'm not clear on whether
the OP
means just ordered alphabetically or actually ordered as in a truly ordered
factor.
My guess is that he means alphabetically which means it's probably just
the preference of the person who wrote the code to order them in the
way he/she did. As far as I can tell, the code never actually creates a
truly ordered factor.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Days to solstice calculation

2013-12-02 Thread White, William Patrick
Thank you for your response. I looked at the insol package and it does seem to 
contain a daylength function, but it would be more informative and aid in my 
growth as an R user to trace the source of the current problem rather than use 
a package as a work around.  Understanding that an alternate solution exists 
and understanding why the alternate solution works are not the same thing in 
this case.

From: skalp.oet...@gmail.com  on behalf of Pascal 
Oettli 
Sent: Monday, December 2, 2013 2:22 AM
To: White, William Patrick
Cc: r-help@R-project.org
Subject: Re: [R] Days to solstice calculation

Hello,

It seems that this kind of calculations are done in package 'insol'.

Regards,
Pascal


On 2 December 2013 15:26, White, William Patrick  wrote:
> Hello,
> I've come across a problem in developing a set of custom functions to 
> calculate the number of hours of daylight at a given latitude, and the number 
> of days a date precedes or secedes the summer solstice. I discovered an 
> inconsistency concerning leap years between my derived values and those from 
> the US naval databases. It seems as far as I can figure that my inconsistency 
> arises either in the calculation I used derived from an ecological modeling 
> study in the 90's, in my understanding of the way R itself handles dates, or 
> in my code. I feel like I must be missing something fundamental here and 
> could use a little guidance. The first function returns the hours of daylight 
> given a latitude, and the Julian day of the year (ie Jan 1 = 1 and so on). 
> This appears to be very accurate. The second function takes a given date, 
> extracts the year, determines the number of days in it, and uses the first 
> function to calculate the hours of daylight in each day, and returns the 
> longest or !
 sh!
>  ortest one (Summer or Winter Solstice). But, in the case of leap years and 
> non leap years, the date returned is identical, as is evidenced by Jan 1 in 
> the provided examples being 170 days from summer solstice in both 2008 and 
> 2007. This was not the case, the solstice should vary by one day between 
> these years. Code is provided below and any help is appreciated.
> Patrick
> ps. apologies to you southern ducks your summer and winter solstices are 
> reversed of my code nomenclature. I'm working with a northern dataset.
>
> Daylength <- function(J,L){
> #Amount of daylight
> #Ecological Modeling, volume 80 (1995) pp. 87-95, "A Model Comparison for 
> Daylength as a Function of Latitude and Day of the Year."
> #D = Daylight length
> #L = Latitude in Degrees
> #J = Day of the year (Julian)
> P <- asin(.39795*cos(.2163108 + 2*atan(.9671396*tan(.00860*(J-186)
> A <- sin(0.8333*pi/180)+sin(L*pi/180)*sin(P)
> B <- cos(L*pi/180)*cos(P)
> D <- 24 - (24/pi)* acos(A/B)
> return(D)
> }
>
> #Example today and here
> Daylength(2,39.7505)
>
> TillSolstice <- function(date,solstice){
> Yr <- as.POSIXlt(date)$year+1900
> a <- as.Date(paste(as.character(Yr),as.character(rep("-01-01", 
> length(Yr))),sep = ""))
> b <- as.Date(paste(as.character(Yr),as.character(rep("-12-31", 
> length(Yr))),sep = ""))
> Winter <- NA
> Summer <- NA
> for (g in 1: length(a)){
> if(is.na(a[g])== FALSE){
> if(is.na(b[g])== FALSE){
>   cc <- seq.int(a[g],b[g], by = '1 day')
>   d <- length(cc)
>   e <- strptime(cc, "%Y-%m-%d")$yday+2
>   f <- Daylength(e,39.6981478)
>   Winter[g] <- which.min(f)
>   Summer[g] <- which.max(f)
> }
> }
> if(is.na(a[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
> if(is.na(b[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
>
>
> }
> #Days until solstice
> if (solstice =='S'){Countdown <- Summer - (strptime(date, "%Y-%m-%d")$yday+2)}
> if (solstice =='W'){Countdown <- Winter - (strptime(a, "%Y-%m-%d")$yday+2)}
> return(Countdown)
> }
>
> Nonleap <- TillSolstice(seq(as.Date("2007/1/1"), as.Date("2007/12/31"), by = 
> "1 day"), solstice = 'S')
> Leap <- TillSolstice(seq(as.Date("2008/1/1"), as.Date("2008/12/31"), by = "1 
> day"), solstice = 'S')
> head(Nonleap)
> tail(Nonleap)
> length(Nonleap)
> head(Leap)
> tail(Leap)
> length(Leap)
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Pascal Oettli
Project Scientist
JAMSTEC
Yokohama, Japan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I extract Random Forest Terms and Probabilities?

2013-12-02 Thread Liaw, Andy
#2 can be done simply with predict(fmi, type="prob").  See the help page for 
predict.randomForest().

Best,
Andy


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of arun
Sent: Tuesday, November 26, 2013 6:57 PM
To: R help
Subject: Re: [R] How do I extract Random Forest Terms and Probabilities?



Hi,
For the first part, you could do:

fmi2 <- fmi 
attributes(fmi2$terms) <- NULL
capture.output(fmi2$terms)
#[1] "Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width"

A.k.

On Tuesday, November 26, 2013 3:55 PM, "Lopez, Dan"  wrote:
Hi R Experts,

I need your help with two question regarding randomForest.


1.       When I run a Random Forest model how do I extract the formula I used 
so that I can store it in a character vector in a dataframe?
For example the dataframe might look like this if I am running models using the 
IRIS dataset
#ModelID,Type,

#001,RF,Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width

fmi<-randomForest(Species~.,iris,mtry=3,ntry=500)
#I know one place where the information is in fmi$terms but not sure how to 
extract just the formula info. Or perhaps there is somewhere else in fmi that I 
could get this?


2.       How do I get the probabilities (probability-like values) from the 
model that was run? I know for the test set I can use predict. And I know to 
extract the classifications from the model I use fmi$predicted. But where are 
the probabilities?


Dan
Workforce Analyst
HRIM - Workforce Analytics & Metrics
LLNL


    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vcf, plink and other files in the /demo of a package

2013-12-02 Thread Federico Calboli
Hi All,

together with colleagues we are planning to submit a 2.0 version of a package 
we have on CRAN.  Because the package deals with high throughput genomic data 
we though it would be nice to have some sort of guidance for the users.  This 
should ideally mean a 'vignette', but as the time of writing nobody had time to 
set one up.  What we have is three scripts that are heavily commented and a 
bunch of files (plink binary files and vcd files) that provide the 'example' 
data for these scripts.  I was wondering whether the /demo directory would be 
an appropriate place where to put these scripts and the relative data.  

I ask because I am checking the package build and I get:

checking index information ... WARNING
Demo index entries without corresponding demo:
[1] "/plink/MultiPhen_plink" "/simul/MultiPhen_simul" "/vcf/MultiPhen_vcf"
See the information on INDEX files and package subdirectories in the
chapter ‘Creating R packages’ of the ‘Writing R Extensions’ manual.

despite the fact I did create a 00Index file in /demo:

/demo$ cat 00Index
/plink/MultiPhen_plink MultiPhen demo of how to use PLINK BED files
/simul/MultiPhen_simul MultiPhen demo of how to run a simulation with 
MultiPhen
/vcf/MultiPhen_vcf MultiPhen demo of how to use data in vcf format

What am I missing?  Given the state of the documentation (i.e. no vignette and 
demo scripts that rely on data that is not in .rda format) would we be better 
off removing this stuff altogether?


BW

F





signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread peter dalgaard

On 02 Dec 2013, at 16:35 , Bert Gunter  wrote:

> Not true, Rich.
> 
>> z <-factor(letters[1:3],lev=letters[3:1])
>> sort(z)
> [1] c b a
> Levels: c b a
> 
> What you say is true only for the **default** sort order.
> 
> (Although maybe the code author didn't realize this either)

The coding is certainly clunky (the phrase about writing FORTRAN in any 
language springs to mind, only with SAS instead of FORTRAN):
>>> 
>>> dodgers$ordered_day_of_week <- with(data=dodgers,
>>>  ifelse ((day_of_week == "Monday"),1,
>>>  ifelse ((day_of_week == "Tuesday"),2,
>>>  ifelse ((day_of_week == "Wednesday"),3,
>>>  ifelse ((day_of_week == "Thursday"),4,
>>>  ifelse ((day_of_week == "Friday"),5,
>>>  ifelse ((day_of_week == "Saturday"),6,7)))
>>> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
>>> levels=1:7,
>>> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>>> 

This'll do:

dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", 
"Sunday"),
labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))

And BTW, it doesn't (and didn't) create an ordered factor, just a factor with a 
different level ordering.


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Richard M. Heiberger
Bert,
the issue is the sort order of the levels.  Time series graphs in the
alphabetical sort
order will be uninterpretable.  I show the three sets of contrasts for
factors, factors
with specified levels, and ordered factors.

week <- 
c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")
class(week)

week.f <- factor(week)
levels(week.f)

week.l <- factor(week, levels=week)
levels(week.l)

week.o <- ordered(week, levels=week)
levels(week.o)

contrasts(week.f)

contrasts(week.l)

contrasts(week.o)

Rich

On Mon, Dec 2, 2013 at 10:35 AM, Bert Gunter  wrote:
> Not true, Rich.
>
>> z <-factor(letters[1:3],lev=letters[3:1])
>> sort(z)
> [1] c b a
> Levels: c b a
>
> What you say is true only for the **default** sort order.
>
> (Although maybe the code author didn't realize this either)
>
> -- Bert
>
>
> On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger  wrote:
>> If days of the week is not an Ordered Factor, then it will be sorted
>> alphabetically.
>> Fr Mo Sa Su Th Tu We
>>
>> Rich
>>
>> On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
>>> I am reading the code below. It acts on a csv file called dodgers.csv with
>>> the following variables.
>>>
>>>
 print(str(dodgers))  # check the structure of the data frame
>>> 'data.frame':   81 obs. of  12 variables:
>>>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
>>> 1 ...
>>>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>>>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
>>> 44807 ...
>>>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
>>> 1 ...
>>>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
>>> 3 3 3 10 ...
>>>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>>>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
>>> ...
>>>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>>>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>>>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>> NULL

>>>
>>> I don't understand why the author of the code decided to make the factor
>>> days_of_week into an ordered factor. Anyone know why this should be done?
>>> Thank you.
>>>
>>> Here is the code:
>>>
>>> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>>>
>>> library(car)  # special functions for linear regression
>>> library(lattice)  # graphics package
>>>
>>> # read in data and create a data frame called dodgers
>>> dodgers <- read.csv("dodgers.csv")
>>> print(str(dodgers))  # check the structure of the data frame
>>>
>>> # define an ordered day-of-week variable
>>> # for plots and data summaries
>>> dodgers$ordered_day_of_week <- with(data=dodgers,
>>>   ifelse ((day_of_week == "Monday"),1,
>>>   ifelse ((day_of_week == "Tuesday"),2,
>>>   ifelse ((day_of_week == "Wednesday"),3,
>>>   ifelse ((day_of_week == "Thursday"),4,
>>>   ifelse ((day_of_week == "Friday"),5,
>>>   ifelse ((day_of_week == "Saturday"),6,7)))
>>> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
>>> levels=1:7,
>>> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>>>
>>> # exploratory data analysis with standard graphics: attendance by day of
>>> week
>>> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
>>> xlab = "Day of Week", ylab = "Attendance (thousands)",
>>> col = "violet", las = 1))
>>>
>>> # when do the Dodgers use bobblehead promotions
>>> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
>>> Tuesday
>>>
>>> # define an ordered month variable
>>> # for plots and data summaries
>>> dodgers$ordered_month <- with(data=dodgers,
>>>   ifelse ((month == "APR"),4,
>>>   ifelse ((month == "MAY"),5,
>>>   ifelse ((month == "JUN"),6,
>>>   ifelse ((month == "JUL"),7,
>>>   ifelse ((month == "AUG"),8,
>>>   ifelse ((month == "SEP"),9,10)))
>>> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
>>> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>>>
>>> # exploratory data analysis with standard R graphics: attendance by month
>>> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
>>> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>>>
>>> # exploratory data analysis displaying many variables
>>> # looking at attendance and conditioning on day/night
>>> # the skies and whether or not fireworks are displayed
>>> library(lattice) # used for plotting
>>> # let us prepare a graphical summary of the dodgers data
>>> group.labels <- c("No Fireworks","Fireworks")
>>> group.symbols <- c(21,24)
>>> group.colors <- c("black","black")
>>> group.fill <- c("black","red")
>>> xyplot(attend/1000 ~ temp | skies + day_night,
>>> data = dodgers, groups = fireworks, pch = group.symb

Re: [R] interpretation of MDS plot in random forest

2013-12-02 Thread Liaw, Andy
Yes, that's part of the intention anyway.  One can also use them to do 
clustering.

Best,
Andy

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Massimo Bressan
Sent: Monday, December 02, 2013 6:34 AM
To: r-help@r-project.org
Subject: [R] interpretation of MDS plot in random forest

Given this general example:

set.seed(1)

data(iris)

iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE)

#varImpPlot(iris.rf)

#varUsed(iris.rf)

MDSplot(iris.rf, iris$Species)

I’ve been reading the documentation about random forest (at best of my - 
poor - knowledge) but I’m in trouble with the correct interpretation of 
the MDS plot and I hope someone can give me some clues

What is intended for “the scaling coordinates of the proximity matrix”?


I think to understand that the objective is here to present the distance 
among species in a parsimonious and visual way (of lower dimensionality)

Is therefore a parallelism to what are intended the principal components 
in a classical PCA?

Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the 
proximity matrix?

If that is correct, how would you find the eigenvalues for that 
eigenvectors? And what are the eigenvalues repreenting?


What are saying these two dimensions in the plot about the different 
iris species? Their relative distance in terms of proximity within the 
space DIM1 and DIM2?

How to choose for the k parameter (number of dimensions for the scaling 
coordinates)?

And finally how would you explain the plot in simple terms?

Thank you for any feedback
Best regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vcf, plink and other files in the /demo of a package

2013-12-02 Thread Duncan Murdoch

On 02/12/2013 11:59 AM, Federico Calboli wrote:

Hi All,

together with colleagues we are planning to submit a 2.0 version of a package 
we have on CRAN.  Because the package deals with high throughput genomic data 
we though it would be nice to have some sort of guidance for the users.  This 
should ideally mean a 'vignette', but as the time of writing nobody had time to 
set one up.  What we have is three scripts that are heavily commented and a 
bunch of files (plink binary files and vcd files) that provide the 'example' 
data for these scripts.  I was wondering whether the /demo directory would be 
an appropriate place where to put these scripts and the relative data.

I ask because I am checking the package build and I get:

checking index information ... WARNING
Demo index entries without corresponding demo:
[1] "/plink/MultiPhen_plink" "/simul/MultiPhen_simul" "/vcf/MultiPhen_vcf"
See the information on INDEX files and package subdirectories in the
chapter ‘Creating R packages’ of the ‘Writing R Extensions’ manual.

despite the fact I did create a 00Index file in /demo:

/demo$ cat 00Index
/plink/MultiPhen_plink MultiPhen demo of how to use PLINK BED files
/simul/MultiPhen_simul MultiPhen demo of how to run a simulation with 
MultiPhen
/vcf/MultiPhen_vcf MultiPhen demo of how to use data in vcf format

What am I missing?  Given the state of the documentation (i.e. no vignette and 
demo scripts that rely on data that is not in .rda format) would we be better 
off removing this stuff altogether?


Those index entries should be filenames of the demo files. The leading 
slash likely means R will interpret them as absolute paths, not relative 
paths, and that won't work.


A more usual way to do it would be to give them simple names, e.g. 
MultiPhen_plink (corresponding to MultiPhen_plink.R).


Since your demos refer to data files that are not embedded in the R 
code, you need to put those data files somewhere. They should go into 
the data directory if they are data a user can read (and then you need 
to follow the restrictions on things in that directory), or into your 
own directory below inst, to just be installed and available for use. In 
the latter case use system.file() to refer to them from within your demo 
code.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Days to solstice calculation

2013-12-02 Thread David Carlson
They are a day apart. Summer solstice is day 172 in both cases
so the calendar dates should be one day apart and they are (June
22 in 2007 and June 21 in 2008):

> strptime("2007-06-22", format="%Y-%m-%d")$yday
[1] 172
> strptime("2008-06-21", format="%Y-%m-%d")$yday
[1] 172

Your Daylength() function gives the same values for days 1-365.
So it will always give 172 as the summer solstice and 355 as the
winter solstice. The leap year just adds a calculation for day
366. No need to calculate them over and over.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of White,
William Patrick
Sent: Monday, December 2, 2013 10:32 AM
To: Pascal Oettli
Cc: r-help@R-project.org
Subject: Re: [R] Days to solstice calculation

Thank you for your response. I looked at the insol package and
it does seem to contain a daylength function, but it would be
more informative and aid in my growth as an R user to trace the
source of the current problem rather than use a package as a
work around.  Understanding that an alternate solution exists
and understanding why the alternate solution works are not the
same thing in this case.

From: skalp.oet...@gmail.com  on behalf
of Pascal Oettli 
Sent: Monday, December 2, 2013 2:22 AM
To: White, William Patrick
Cc: r-help@R-project.org
Subject: Re: [R] Days to solstice calculation

Hello,

It seems that this kind of calculations are done in package
'insol'.

Regards,
Pascal


On 2 December 2013 15:26, White, William Patrick
 wrote:
> Hello,
> I've come across a problem in developing a set of custom
functions to calculate the number of hours of daylight at a
given latitude, and the number of days a date precedes or
secedes the summer solstice. I discovered an inconsistency
concerning leap years between my derived values and those from
the US naval databases. It seems as far as I can figure that my
inconsistency arises either in the calculation I used derived
from an ecological modeling study in the 90's, in my
understanding of the way R itself handles dates, or in my code.
I feel like I must be missing something fundamental here and
could use a little guidance. The first function returns the
hours of daylight given a latitude, and the Julian day of the
year (ie Jan 1 = 1 and so on). This appears to be very accurate.
The second function takes a given date, extracts the year,
determines the number of days in it, and uses the first function
to calculate the hours of daylight in each day, and returns the
longest or !
 sh!
>  ortest one (Summer or Winter Solstice). But, in the case of
leap years and non leap years, the date returned is identical,
as is evidenced by Jan 1 in the provided examples being 170 days
from summer solstice in both 2008 and 2007. This was not the
case, the solstice should vary by one day between these years.
Code is provided below and any help is appreciated.
> Patrick
> ps. apologies to you southern ducks your summer and winter
solstices are reversed of my code nomenclature. I'm working with
a northern dataset.
>
> Daylength <- function(J,L){
> #Amount of daylight
> #Ecological Modeling, volume 80 (1995) pp. 87-95, "A Model
Comparison for Daylength as a Function of Latitude and Day of
the Year."
> #D = Daylight length
> #L = Latitude in Degrees
> #J = Day of the year (Julian)
> P <- asin(.39795*cos(.2163108 +
2*atan(.9671396*tan(.00860*(J-186)
> A <- sin(0.8333*pi/180)+sin(L*pi/180)*sin(P)
> B <- cos(L*pi/180)*cos(P)
> D <- 24 - (24/pi)* acos(A/B)
> return(D)
> }
>
> #Example today and here
> Daylength(2,39.7505)
>
> TillSolstice <- function(date,solstice){
> Yr <- as.POSIXlt(date)$year+1900
> a <- as.Date(paste(as.character(Yr),as.character(rep("-01-01",
length(Yr))),sep = ""))
> b <- as.Date(paste(as.character(Yr),as.character(rep("-12-31",
length(Yr))),sep = ""))
> Winter <- NA
> Summer <- NA
> for (g in 1: length(a)){
> if(is.na(a[g])== FALSE){
> if(is.na(b[g])== FALSE){
>   cc <- seq.int(a[g],b[g], by = '1 day')
>   d <- length(cc)
>   e <- strptime(cc, "%Y-%m-%d")$yday+2
>   f <- Daylength(e,39.6981478)
>   Winter[g] <- which.min(f)
>   Summer[g] <- which.max(f)
> }
> }
> if(is.na(a[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
> if(is.na(b[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
>
>
> }
> #Days until solstice
> if (solstice =='S'){Countdown <- Summer - (strptime(date,
"%Y-%m-%d")$yday+2)}
> if (solstice =='W'){Countdown <- Winter - (strptime(a,
"%Y-%m-%d")$yday+2)}
> return(Countdown)
> }
>
> Nonleap <- TillSolstice(seq(as.Date("2007/1/1"),
as.Date("2007/12/31"), by = "1 day"), solstice = 'S')
> Leap <- TillSolstice(seq(as.Date("2008/1/1"),
as.Date("2008/12/31"), by = "1 day"), solstice = 'S')
> head(Nonleap)
> tail(Nonleap)
> length(Nonleap)
> head(Leap)
> tail(Leap)
> length(Leap)
>
>
> [[alterna

Re: [R] Days to solstice calculation

2013-12-02 Thread White, William Patrick
Thank you. I can't believe I didn't notice that. What a relief.

From: David Carlson 
Sent: Monday, December 2, 2013 1:47 PM
To: White, William Patrick; 'Pascal Oettli'
Cc: 'r-help@R-project.org'
Subject: RE: [R] Days to solstice calculation

They are a day apart. Summer solstice is day 172 in both cases
so the calendar dates should be one day apart and they are (June
22 in 2007 and June 21 in 2008):

> strptime("2007-06-22", format="%Y-%m-%d")$yday
[1] 172
> strptime("2008-06-21", format="%Y-%m-%d")$yday
[1] 172

Your Daylength() function gives the same values for days 1-365.
So it will always give 172 as the summer solstice and 355 as the
winter solstice. The leap year just adds a calculation for day
366. No need to calculate them over and over.

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of White,
William Patrick
Sent: Monday, December 2, 2013 10:32 AM
To: Pascal Oettli
Cc: r-help@R-project.org
Subject: Re: [R] Days to solstice calculation

Thank you for your response. I looked at the insol package and
it does seem to contain a daylength function, but it would be
more informative and aid in my growth as an R user to trace the
source of the current problem rather than use a package as a
work around.  Understanding that an alternate solution exists
and understanding why the alternate solution works are not the
same thing in this case.

From: skalp.oet...@gmail.com  on behalf
of Pascal Oettli 
Sent: Monday, December 2, 2013 2:22 AM
To: White, William Patrick
Cc: r-help@R-project.org
Subject: Re: [R] Days to solstice calculation

Hello,

It seems that this kind of calculations are done in package
'insol'.

Regards,
Pascal


On 2 December 2013 15:26, White, William Patrick
 wrote:
> Hello,
> I've come across a problem in developing a set of custom
functions to calculate the number of hours of daylight at a
given latitude, and the number of days a date precedes or
secedes the summer solstice. I discovered an inconsistency
concerning leap years between my derived values and those from
the US naval databases. It seems as far as I can figure that my
inconsistency arises either in the calculation I used derived
from an ecological modeling study in the 90's, in my
understanding of the way R itself handles dates, or in my code.
I feel like I must be missing something fundamental here and
could use a little guidance. The first function returns the
hours of daylight given a latitude, and the Julian day of the
year (ie Jan 1 = 1 and so on). This appears to be very accurate.
The second function takes a given date, extracts the year,
determines the number of days in it, and uses the first function
to calculate the hours of daylight in each day, and returns the
longest or !
 sh!
>  ortest one (Summer or Winter Solstice). But, in the case of
leap years and non leap years, the date returned is identical,
as is evidenced by Jan 1 in the provided examples being 170 days
from summer solstice in both 2008 and 2007. This was not the
case, the solstice should vary by one day between these years.
Code is provided below and any help is appreciated.
> Patrick
> ps. apologies to you southern ducks your summer and winter
solstices are reversed of my code nomenclature. I'm working with
a northern dataset.
>
> Daylength <- function(J,L){
> #Amount of daylight
> #Ecological Modeling, volume 80 (1995) pp. 87-95, "A Model
Comparison for Daylength as a Function of Latitude and Day of
the Year."
> #D = Daylight length
> #L = Latitude in Degrees
> #J = Day of the year (Julian)
> P <- asin(.39795*cos(.2163108 +
2*atan(.9671396*tan(.00860*(J-186)
> A <- sin(0.8333*pi/180)+sin(L*pi/180)*sin(P)
> B <- cos(L*pi/180)*cos(P)
> D <- 24 - (24/pi)* acos(A/B)
> return(D)
> }
>
> #Example today and here
> Daylength(2,39.7505)
>
> TillSolstice <- function(date,solstice){
> Yr <- as.POSIXlt(date)$year+1900
> a <- as.Date(paste(as.character(Yr),as.character(rep("-01-01",
length(Yr))),sep = ""))
> b <- as.Date(paste(as.character(Yr),as.character(rep("-12-31",
length(Yr))),sep = ""))
> Winter <- NA
> Summer <- NA
> for (g in 1: length(a)){
> if(is.na(a[g])== FALSE){
> if(is.na(b[g])== FALSE){
>   cc <- seq.int(a[g],b[g], by = '1 day')
>   d <- length(cc)
>   e <- strptime(cc, "%Y-%m-%d")$yday+2
>   f <- Daylength(e,39.6981478)
>   Winter[g] <- which.min(f)
>   Summer[g] <- which.max(f)
> }
> }
> if(is.na(a[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
> if(is.na(b[g])== TRUE){
>  Winter[g] <- NA
>   Summer[g] <- NA
> }
>
>
> }
> #Days until solstice
> if (solstice =='S'){Countdown <- Summer - (strptime(date,
"%Y-%m-%d")$yday+2)}
> if (solstice =='W'){Countdown <- Winter - (strptime(a,
"%Y-%m-%d")$yday+2)}
> return(Countdown)
> }
>
> Nonleap <- TillSolstice(seq(as.Date(

[R] Question about ifelse() XXXX

2013-12-02 Thread Dan Abner
 Hi all,

Can anyone explain what is happening with element 4,4 of c1? ifelse()
is not recongizing it as value 1:

> c1
  q1q2q3q4
q1 1.000 0.6668711 0.6948419 0.5758860
q2 0.6668711 1.000 0.6040746 0.4917447
q3 0.6948419 0.6040746 1.000 0.4730732
q4 0.5758860 0.4917447 0.4730732 1.000
> ifelse(c1==1,1,0)
   q1 q2 q3 q4
q1  1  0  0  0
q2  0  1  0  0
q3  0  0  1  0
q4  0  0  0  0
> c1[4,4]
[1] 1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ifelse() XXXX

2013-12-02 Thread Duncan Murdoch

On 02/12/2013 2:08 PM, Dan Abner wrote:

  Hi all,

Can anyone explain what is happening with element 4,4 of c1? ifelse()
is not recongizing it as value 1:


FAQ 7.31.

Duncan Murdoch



> c1
   q1q2q3q4
q1 1.000 0.6668711 0.6948419 0.5758860
q2 0.6668711 1.000 0.6040746 0.4917447
q3 0.6948419 0.6040746 1.000 0.4730732
q4 0.5758860 0.4917447 0.4730732 1.000
> ifelse(c1==1,1,0)
q1 q2 q3 q4
q1  1  0  0  0
q2  0  1  0  0
q3  0  0  1  0
q4  0  0  0  0
> c1[4,4]
[1] 1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generate multiple probability distributions

2013-12-02 Thread Michael Friendly

On 12/2/2013 10:12 AM, Duncan Murdoch wrote:
dbinom can take vector inputs for the parameters, so this would be a 
bit simpler:


x <- seq(0,12)
x <- rep(x, 4)
p <- rep(c(1/6, 1/3, 1/2, 2/3), each=13)
bin.df <- data.frame(x, prob = dbinom(x, 12, p), p)


Thanks, Duncan

What I was missing was how dbinom() could use vector inputs for both x & p.
Using expand.grid() gives me what I want and is more
general for my purposes.  The main purpose of making p a factor is to 
facilitate

plots of prob ~ x|p.


XP <-expand.grid(x=0:12, p=c(1/6, 1/3, 1/2, 2/3))
bin.df <- data.frame(XP, prob=dbinom(XP[,"x"], 12, XP[,"p"]))
bin.df$p <- factor(bin.df$p, labels=c("1/6", "1/3", "1/2", "2/3"))
str(bin.df)

-Michael


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] legend position

2013-12-02 Thread philippe massicotte
Hi all. 
I'm ploting a raster and I can't find the proper way to move the legend. For 
example,
r = raster(system.file("external/test.grd", package="raster"))plot(r)
How can I put the legend at the desired position?
Thank in advance,Phil 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plus/minus +/- in factor; not plotmath not expression

2013-12-02 Thread Jacob Wegelin


I want to put the "plus or minus" symbol into a character variable, so that this can be 
turned into a factor and be displayed in the "strip" of a faceted ggplot2 plot.

A very nice solution, thanks to Professor Ripley's post of Nov 16, 2008; 
3:13pm, visible at 
http://r.789695.n4.nabble.com/Symbols-to-use-in-text-td874239.html and 
subsequently http://www.fileformat.info/info/unicode/char/00b1/index.htm, is:

junk<- "\u00B1"
print(junk)

#   This works very nicely. For instance:

junk<-data.frame(gug=c(
rep( "\u00B1 1.2", 10)
,
rep( "\u00B1 2.3", 10)
)
)
junk$eks<-1:nrow(junk)
junk$why<-with(junk, as.numeric(gug) + eks)
print(summary(junk))
library(ggplot2)
print(
ggplot(data=junk, mapping=aes(x=eks, y=why))
+ geom_point()
+ facet_grid(. ~ gug)
)

This works very nicely on my system, but I just wanted to enquire:

Is this machine-independent and stable?

Is there a "native R" way to do this?

I did this in:


sessionInfo()

R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ggplot2_0.9.3.1

loaded via a namespace (and not attached):
 [1] colorspace_1.2-0   dichromat_1.2-4digest_0.6.0   grid_2.15.3   
 gtable_0.1.2   labeling_0.1
 [7] MASS_7.3-23munsell_0.4plyr_1.8   proto_0.3-10  
 psych_1.2.8RColorBrewer_1.0-5
[13] reshape2_1.2.2 scales_0.2.3   stringr_0.6.2 




Incidentally (and for the sake of keyword searches): Although a google search 
initially led me to posts about expression() and plotmath, those eventually had 
nothing to do with the solution.

Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
830 E. Main St., Seventh Floor
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. 
CTSA grant: UL1TR58

URL: http://www.people.vcu.edu/~jwegelin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about ifelse() XXXX

2013-12-02 Thread William Dunlap
What is the value of
  diag(c1) - 1
?

(Or, use digits=16 when printing c1.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Dan Abner
> Sent: Monday, December 02, 2013 11:09 AM
> To: r-help@r-project.org
> Subject: [R] Question about ifelse() 
> 
>  Hi all,
> 
> Can anyone explain what is happening with element 4,4 of c1? ifelse()
> is not recongizing it as value 1:
> 
> > c1
>   q1q2q3q4
> q1 1.000 0.6668711 0.6948419 0.5758860
> q2 0.6668711 1.000 0.6040746 0.4917447
> q3 0.6948419 0.6040746 1.000 0.4730732
> q4 0.5758860 0.4917447 0.4730732 1.000
> > ifelse(c1==1,1,0)
>q1 q2 q3 q4
> q1  1  0  0  0
> q2  0  1  0  0
> q3  0  0  1  0
> q4  0  0  0  0
> > c1[4,4]
> [1] 1
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend position

2013-12-02 Thread Carl Witthoft
See ?legend .   you can add a legend directly to an existing plot.  An
example:

legend('topright',c('hot','cold'),lty=1,col=c('red','green'),bg='white')

Now if you're trying to place the legend outside the plot area (i.e. in some
other part of the window),
you'll need to invoke par(xpd=TRUE) . See the help at ?par .


Filoche wrote
> Hi all. 
> I'm ploting a raster and I can't find the proper way to move the legend.
> For example,
> r = raster(system.file("external/test.grd", package="raster"))plot(r)
> How can I put the legend at the desired position?
> Thank in advance,Phil   
>   [[alternative HTML version deleted]]
> 
> __

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





--
View this message in context: 
http://r.789695.n4.nabble.com/legend-position-tp4681489p4681492.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Robert Baer

On 12/2/2013 9:35 AM, Bert Gunter wrote:
> Not true, Rich.
The point about alphabetical ordering explains why the author likely 
explicitly set the levels for the factor, though.

As to why ordered factors, we may never know, but one possible 
explanation is that at some point he was going to use statistics where 
he wanted to use polynomial contrasts. See

options()$contrasts

Note that the default contrast type differs for normal factors and ordered 
factors.



>> z <-factor(letters[1:3],lev=letters[3:1])
>> sort(z)
> [1] c b a
> Levels: c b a
>
> What you say is true only for the **default** sort order.
>
> (Although maybe the code author didn't realize this either)
>
> -- Bert
>
>
> On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger  wrote:
>> If days of the week is not an Ordered Factor, then it will be sorted
>> alphabetically.
>> Fr Mo Sa Su Th Tu We
>>
>> Rich
>>
>> On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
>>> I am reading the code below. It acts on a csv file called dodgers.csv with
>>> the following variables.
>>>
>>>
 print(str(dodgers))  # check the structure of the data frame
>>> 'data.frame':   81 obs. of  12 variables:
>>>   $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
>>> 1 ...
>>>   $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>>>   $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
>>> 44807 ...
>>>   $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
>>> 1 ...
>>>   $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
>>> 3 3 3 10 ...
>>>   $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>>>   $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
>>> ...
>>>   $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>>>   $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>>   $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>>   $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>>>   $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>>> NULL
>>> I don't understand why the author of the code decided to make the factor
>>> days_of_week into an ordered factor. Anyone know why this should be done?
>>> Thank you.
>>>
>>> Here is the code:
>>>
>>> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>>>
>>> library(car)  # special functions for linear regression
>>> library(lattice)  # graphics package
>>>
>>> # read in data and create a data frame called dodgers
>>> dodgers <- read.csv("dodgers.csv")
>>> print(str(dodgers))  # check the structure of the data frame
>>>
>>> # define an ordered day-of-week variable
>>> # for plots and data summaries
>>> dodgers$ordered_day_of_week <- with(data=dodgers,
>>>ifelse ((day_of_week == "Monday"),1,
>>>ifelse ((day_of_week == "Tuesday"),2,
>>>ifelse ((day_of_week == "Wednesday"),3,
>>>ifelse ((day_of_week == "Thursday"),4,
>>>ifelse ((day_of_week == "Friday"),5,
>>>ifelse ((day_of_week == "Saturday"),6,7)))
>>> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
>>> levels=1:7,
>>> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>>>
>>> # exploratory data analysis with standard graphics: attendance by day of
>>> week
>>> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
>>> xlab = "Day of Week", ylab = "Attendance (thousands)",
>>> col = "violet", las = 1))
>>>
>>> # when do the Dodgers use bobblehead promotions
>>> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
>>> Tuesday
>>>
>>> # define an ordered month variable
>>> # for plots and data summaries
>>> dodgers$ordered_month <- with(data=dodgers,
>>>ifelse ((month == "APR"),4,
>>>ifelse ((month == "MAY"),5,
>>>ifelse ((month == "JUN"),6,
>>>ifelse ((month == "JUL"),7,
>>>ifelse ((month == "AUG"),8,
>>>ifelse ((month == "SEP"),9,10)))
>>> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
>>> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>>>
>>> # exploratory data analysis with standard R graphics: attendance by month
>>> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
>>> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>>>
>>> # exploratory data analysis displaying many variables
>>> # looking at attendance and conditioning on day/night
>>> # the skies and whether or not fireworks are displayed
>>> library(lattice) # used for plotting
>>> # let us prepare a graphical summary of the dodgers data
>>> group.labels <- c("No Fireworks","Fireworks")
>>> group.symbols <- c(21,24)
>>> group.colors <- c("black","black")
>>> group.fill <- c("black","red")
>>> xyplot(attend/1000 ~ temp | skies + day_night,
>>>  data = dodgers, groups = fireworks, pch = group.symbols,
>>>  aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
>>>  layout = c(2, 2), type = c("p","

Re: [R] legend position

2013-12-02 Thread philippe massicotte
Thank you for reply.
If I'm not wrong, legend(...) will works for discrete elements.  I'm not sure 
hot to use it for a colorbar legend sur as the one in the example bellow.
Phil
> Date: Mon, 2 Dec 2013 11:49:19 -0800
> From: c...@witthoft.com
> To: r-help@r-project.org
> Subject: Re: [R] legend position
> 
> See ?legend .   you can add a legend directly to an existing plot.  An
> example:
> 
> legend('topright',c('hot','cold'),lty=1,col=c('red','green'),bg='white')
> 
> Now if you're trying to place the legend outside the plot area (i.e. in some
> other part of the window),
> you'll need to invoke par(xpd=TRUE) . See the help at ?par .
> 
> 
> Filoche wrote
> > Hi all. 
> > I'm ploting a raster and I can't find the proper way to move the legend.
> > For example,
> > r = raster(system.file("external/test.grd", package="raster"))plot(r)
> > How can I put the legend at the desired position?
> > Thank in advance,Phil 
> > [[alternative HTML version deleted]]
> > 
> > __
> 
> > R-help@
> 
> >  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/legend-position-tp4681489p4681492.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bert Gunter
Did you not see Mark Leeds's post?

The OP apparently did not really mean R's "ordered factors" as
produced by the R ordered() constructor; rather, he meant "factors
with levels ordered differently than the default", for which Rich's
answer was apropos. Mine -- and now yours -- in which we wrongly
assumed the OP knew what he was saying by "ordered factor", was not.

-- Bert

On Mon, Dec 2, 2013 at 11:51 AM, Robert Baer  wrote:
>
> On 12/2/2013 9:35 AM, Bert Gunter wrote:
>
> Not true, Rich.
>
> The point about alphabetical ordering explains why the author likely
> explicitly set the levels for the factor, though.
>
> As to why ordered factors, we may never know, but one possible explanation
> is that at some point he was going to use statistics where he wanted to use
> polynomial contrasts. See
>
> options()$contrasts
>
> Note that the default contrast type differs for normal factors and ordered
> factors.
>
>
>
> z <-factor(letters[1:3],lev=letters[3:1])
> sort(z)
>
> [1] c b a
> Levels: c b a
>
> What you say is true only for the **default** sort order.
>
> (Although maybe the code author didn't realize this either)
>
> -- Bert
>
>
> On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger  wrote:
>
> If days of the week is not an Ordered Factor, then it will be sorted
> alphabetically.
> Fr Mo Sa Su Th Tu We
>
> Rich
>
> On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
>
> I am reading the code below. It acts on a csv file called dodgers.csv with
> the following variables.
>
>
> print(str(dodgers))  # check the structure of the data frame
>
> 'data.frame':   81 obs. of  12 variables:
>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
> 1 ...
>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
> 44807 ...
>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
> 1 ...
>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
> 3 3 3 10 ...
>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> ...
>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> NULL
>
> I don't understand why the author of the code decided to make the factor
> days_of_week into an ordered factor. Anyone know why this should be done?
> Thank you.
>
> Here is the code:
>
> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>
> library(car)  # special functions for linear regression
> library(lattice)  # graphics package
>
> # read in data and create a data frame called dodgers
> dodgers <- read.csv("dodgers.csv")
> print(str(dodgers))  # check the structure of the data frame
>
> # define an ordered day-of-week variable
> # for plots and data summaries
> dodgers$ordered_day_of_week <- with(data=dodgers,
>   ifelse ((day_of_week == "Monday"),1,
>   ifelse ((day_of_week == "Tuesday"),2,
>   ifelse ((day_of_week == "Wednesday"),3,
>   ifelse ((day_of_week == "Thursday"),4,
>   ifelse ((day_of_week == "Friday"),5,
>   ifelse ((day_of_week == "Saturday"),6,7)))
> dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
> levels=1:7,
> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>
> # exploratory data analysis with standard graphics: attendance by day of
> week
> with(data=dodgers,plot(ordered_day_of_week, attend/1000,
> xlab = "Day of Week", ylab = "Attendance (thousands)",
> col = "violet", las = 1))
>
> # when do the Dodgers use bobblehead promotions
> with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
> Tuesday
>
> # define an ordered month variable
> # for plots and data summaries
> dodgers$ordered_month <- with(data=dodgers,
>   ifelse ((month == "APR"),4,
>   ifelse ((month == "MAY"),5,
>   ifelse ((month == "JUN"),6,
>   ifelse ((month == "JUL"),7,
>   ifelse ((month == "AUG"),8,
>   ifelse ((month == "SEP"),9,10)))
> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
> labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>
> # exploratory data analysis with standard R graphics: attendance by month
> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month",
> ylab = "Attendance (thousands)", col = "light blue", las = 1))
>
> # exploratory data analysis displaying many variables
> # looking at attendance and conditioning on day/night
> # the skies and whether or not fireworks are displayed
> library(lattice) # used for plotting
> # let us prepare a graphical summary of the dodgers data
> group.labels <- c("No Fireworks","Fireworks")
> group.symbols <- c(21,24)
> group.colors <- c("black","black")
> gr

Re: [R] legend position

2013-12-02 Thread Carl Witthoft
It occurs to me that perhaps you're referring to the 'color bar' on the right
of the plot.  AFAIK you cannot get at that from the raster::plot method.  
However  lattice::levelplot does allow you to manipulate or remove that
colorbar.




--
View this message in context: 
http://r.789695.n4.nabble.com/legend-position-tp4681489p4681497.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend position

2013-12-02 Thread philippe massicotte
Thank you, I'll try to work with lattice.
Regards,Phil

> Date: Mon, 2 Dec 2013 12:06:50 -0800
> From: c...@witthoft.com
> To: r-help@r-project.org
> Subject: Re: [R] legend position
> 
> It occurs to me that perhaps you're referring to the 'color bar' on the right
> of the plot.  AFAIK you cannot get at that from the raster::plot method.  
> However  lattice::levelplot does allow you to manipulate or remove that
> colorbar.
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/legend-position-tp4681489p4681497.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XLConnect readWorksheet comma decimal sign

2013-12-02 Thread David Winsemius

On Dec 2, 2013, at 1:17 AM, Knut Krueger wrote:

> Am 29.11.2013 20:39, schrieb David Winsemius:
>>> Thats impossible, we are used to hit the comma
>> I don't know what that means.
> it is common here, that the decimal sign is commy

Believe me, I _do_ understand that in Europe it is common to use a comma as a 
decimal-sign. I told you how to adjust that for data input using `read.table` 
at the time of data input. R uses the period internally in all locales as the 
decimal separator. There is no mechanism that I know of that allows the console 
output to be with the period. For output with `write.table` you can again 
specify the use of the comma as the decimal separator and some other character 
as the field separator. In fact you can set that globally with:

?options

> options()$OutDec  # my setting
[1] "."

 If you continue having difficulty using XLConnect, then you should contact the 
authors of that package.

> All computer in the cip-pools are using the "comma" ( an I think 99.9% of all 
> other computers here)
> Can you imagine what would happen after  changing  this to dot?
> Or in the other way, try to get the people in your country to use the ,comma 
> as separator. It would cause a big jumble.
> 
>> Until you show a reproducible example, we will not be able to offer further 
>> advice: 
> That*s the problem ... I am still trying to find out  what happened. It was 
> definitely wrong in two cases
> I was sure that I found the reason when starting this tread...
> 
> Knut
> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend position

2013-12-02 Thread David Carlson
It is not straightforward unless you want the legend in the
right or the bottom margins. To put the legend inside the plot
region it is simplest to use image() to plot the raster file and
then image.plot(legend.only=TRUE) to add the legend. In addition
to reading the help page for plot{raster}, you also need the
pages for image{raster} and image.plot{fields}. Here are two
simple examples.

image(r,  col=rev(terrain.colors(255)))
plot(r, horizontal=TRUE, smallplot=c(.15, .5, .84, .86),
legend.only=TRUE)

image(r,  col=rev(terrain.colors(255)))
plot(r, smallplot=c(.15, .17, .5, .85), legend.only=TRUE)

-
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of philippe
massicotte
Sent: Monday, December 2, 2013 1:22 PM
To: r-help@r-project.org
Subject: [R] legend position

Hi all. 
I'm ploting a raster and I can't find the proper way to move the
legend. For example,
r = raster(system.file("external/test.grd",
package="raster"))plot(r)
How can I put the legend at the desired position?
Thank in advance,Phil 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] legend position

2013-12-02 Thread philippe massicotte
Thank you David, it is exactly what I needed.
Regards,Phil

> From: dcarl...@tamu.edu
> To: pmassico...@hotmail.com; r-help@r-project.org
> Subject: RE: [R] legend position
> Date: Mon, 2 Dec 2013 14:29:06 -0600
> 
> It is not straightforward unless you want the legend in the
> right or the bottom margins. To put the legend inside the plot
> region it is simplest to use image() to plot the raster file and
> then image.plot(legend.only=TRUE) to add the legend. In addition
> to reading the help page for plot{raster}, you also need the
> pages for image{raster} and image.plot{fields}. Here are two
> simple examples.
> 
> image(r,  col=rev(terrain.colors(255)))
> plot(r, horizontal=TRUE, smallplot=c(.15, .5, .84, .86),
> legend.only=TRUE)
> 
> image(r,  col=rev(terrain.colors(255)))
> plot(r, smallplot=c(.15, .17, .5, .85), legend.only=TRUE)
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of philippe
> massicotte
> Sent: Monday, December 2, 2013 1:22 PM
> To: r-help@r-project.org
> Subject: [R] legend position
> 
> Hi all. 
> I'm ploting a raster and I can't find the proper way to move the
> legend. For example,
> r = raster(system.file("external/test.grd",
> package="raster"))plot(r)
> How can I put the legend at the desired position?
> Thank in advance,Phil   
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.
> 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plus/minus +/- in factor; not plotmath not expression

2013-12-02 Thread Duncan Murdoch

On 02/12/2013 2:22 PM, Jacob Wegelin wrote:

I want to put the "plus or minus" symbol into a character variable, so that this can be 
turned into a factor and be displayed in the "strip" of a faceted ggplot2 plot.

A very nice solution, thanks to Professor Ripley's post of Nov 16, 2008; 
3:13pm, visible at 
http://r.789695.n4.nabble.com/Symbols-to-use-in-text-td874239.html and 
subsequently http://www.fileformat.info/info/unicode/char/00b1/index.htm, is:

junk<- "\u00B1"
print(junk)

#   This works very nicely. For instance:

junk<-data.frame(gug=c(
rep( "\u00B1 1.2", 10)
,
rep( "\u00B1 2.3", 10)
)
)
junk$eks<-1:nrow(junk)
junk$why<-with(junk, as.numeric(gug) + eks)
print(summary(junk))
library(ggplot2)
print(
ggplot(data=junk, mapping=aes(x=eks, y=why))
+ geom_point()
+ facet_grid(. ~ gug)
)

This works very nicely on my system, but I just wanted to enquire:

Is this machine-independent and stable?


It is machine-independent and stable because \u00B1 means "Unicode 
PLUS-MINUS SIGN", but it is not device-independent.  There may be a 
graphics device that does not support all Unicode characters.   I'd 
guess it is pretty widely available though.


Is there a "native R" way to do this?


That is native R.

Duncan Murdoch


I did this in:

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ggplot2_0.9.3.1

loaded via a namespace (and not attached):
   [1] colorspace_1.2-0   dichromat_1.2-4digest_0.6.0   grid_2.15.3 
   gtable_0.1.2   labeling_0.1
   [7] MASS_7.3-23munsell_0.4plyr_1.8   proto_0.3-10
   psych_1.2.8RColorBrewer_1.0-5
[13] reshape2_1.2.2 scales_0.2.3   stringr_0.6.2
>

Incidentally (and for the sake of keyword searches): Although a google search 
initially led me to posts about expression() and plotmath, those eventually had 
nothing to do with the solution.

Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
830 E. Main St., Seventh Floor
P. O. Box 980032
Richmond VA 23298-0032
U.S.A.
CTSA grant: UL1TR58
URL: http://www.people.vcu.edu/~jwegelin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plus/minus +/- in factor; not plotmath not expression

2013-12-02 Thread David Winsemius

On Dec 2, 2013, at 11:22 AM, Jacob Wegelin wrote:

> 
> I want to put the "plus or minus" symbol into a character variable, so that 
> this can be turned into a factor and be displayed in the "strip" of a faceted 
> ggplot2 plot.
> 
> A very nice solution, thanks to Professor Ripley's post of Nov 16, 2008; 
> 3:13pm, visible at 
> http://r.789695.n4.nabble.com/Symbols-to-use-in-text-td874239.html and 
> subsequently http://www.fileformat.info/info/unicode/char/00b1/index.htm, is:
> 
> junk<- "\u00B1"
> print(junk)
> 
> # This works very nicely. For instance:
> 
snipped code
> 
> This works very nicely on my system, but I just wanted to enquire:
> 
> Is this machine-independent and stable?

It is font-_dependent_. It displays fine on a Mac console if that's any help, 
but it seems you probably already know that. I tested it on a 2.15.3 version of 
R on a Windows XP machine which probably has the default font settings for that 
ancient OS and it displayed fine there, too. It really depends on whether the 
default font for you OS has a glyph in that position in its font table.

> 
> Is there a "native R" way to do this?
> 
> I did this in:
> 
>> sessionInfo()
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> 
> 
> Incidentally (and for the sake of keyword searches): Although a google search 
> initially led me to posts about expression() and plotmath, those eventually 
> had nothing to do with the solution.

That's not entirely true. The links on the ?plotmath page in the "Other 
symbols" section send you to ?points which has very instructive examples. I 
keep an annotated version of the output of TestChars(font=5) on the side of my 
desktop machine.

TestChars <- function(sign = 1, font = 1, ...)
{
   MB <- l10n_info()$MBCS
   r <- if(font == 5) { sign <- 1; c(32:126, 160:254)
   } else if(MB) 32:126 else 32:255
   if (sign == -1) r <- c(32:126, 160:255)
   par(pty = "s")
   plot(c(-1,16), c(-1,16), type = "n", xlab = "", ylab = "",
xaxs = "i", yaxs = "i",
main = sprintf("sign = %d, font = %d", sign, font))
   grid(17, 17, lty = 1) ; mtext(paste("MBCS:", MB))
   for(i in r) try( points(i%%16, i%/%16, pch = sign*i, font = font,...))
   for(i in r) try( text( (i%%16)-0.2, (i%/%16)-0.2,  as.character(i), font = 
1, cex=0.5))
}

TestChars(font = 5)

You can see on that graphic that "±" is 177 and:
> strtoi(0x00B1)
[1] 177
> as.hexmode(177)
[1] "b1

-- 
David.
> 
> Jacob A. Wegelin


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plus/minus +/- in factor; not plotmath not expression

2013-12-02 Thread David Winsemius

On Dec 2, 2013, at 1:01 PM, Duncan Murdoch wrote:

> On 02/12/2013 2:22 PM, Jacob Wegelin wrote:
>> I want to put the "plus or minus" symbol into a character variable, so that 
>> this can be turned into a factor and be displayed in the "strip" of a 
>> faceted ggplot2 plot.
>> 
>> A very nice solution, thanks to Professor Ripley's post of Nov 16, 2008; 
>> 3:13pm, visible at 
>> http://r.789695.n4.nabble.com/Symbols-to-use-in-text-td874239.html and 
>> subsequently http://www.fileformat.info/info/unicode/char/00b1/index.htm, is:
>> 
>> junk<- "\u00B1"
>> print(junk)
>> 
snipped
>> Is there a "native R" way to do this?
> 
> That is native R.

There is also plotmath's '%+-%' operator:

plot(1,1, ylab = expression( A %+-% B ), xlab=expression( C%+-% D ) )

I noticed that Jacob was using ggplot2. Generally one can eventually find ways 
to label ggplot2 output with R expressions (used in the strict R language sense 
of the word), although sometimes it has been difficult for me to find the 
methods in the help pages.

-- 
David.
> 
> Duncan Murdoch
>> 
>> I did this in:
>> 
>> > sessionInfo()
>> R version 2.15.3 (2013-03-01)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>> 
>> 
>> 
>> Incidentally (and for the sake of keyword searches): Although a google 
>> search initially led me to posts about expression() and plotmath, those 
>> eventually had nothing to do with the solution.
>> 
>> Jacob A. Wegelin
> 


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpretation of MDS plot in random forest

2013-12-02 Thread mbressan
thanks andy

it's a real honour form me to get a reply by you;
I'm still a bit faraway from a proper grasp of the purpose of the plot...

may I ask you for a more technical (trivial) issue?
is it possible to add a legend in the MDS plot?
my problem is to link the color points in the chart to the factor that was
used as response to train rf, how to?

best

max

> Yes, that's part of the intention anyway.  One can also use them to do
> clustering.
>
> Best,
> Andy
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Massimo Bressan
> Sent: Monday, December 02, 2013 6:34 AM
> To: r-help@r-project.org
> Subject: [R] interpretation of MDS plot in random forest
>
> Given this general example:
>
> set.seed(1)
>
> data(iris)
>
> iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE,
> keep.forest=TRUE)
>
> #varImpPlot(iris.rf)
>
> #varUsed(iris.rf)
>
> MDSplot(iris.rf, iris$Species)
>
> I’ve been reading the documentation about random forest (at best of my -
> poor - knowledge) but I’m in trouble with the correct interpretation of
> the MDS plot and I hope someone can give me some clues
>
> What is intended for “the scaling coordinates of the proximity matrix”?
>
>
> I think to understand that the objective is here to present the distance
> among species in a parsimonious and visual way (of lower dimensionality)
>
> Is therefore a parallelism to what are intended the principal components
> in a classical PCA?
>
> Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the
> proximity matrix?
>
> If that is correct, how would you find the eigenvalues for that
> eigenvectors? And what are the eigenvalues repreenting?
>
>
> What are saying these two dimensions in the plot about the different
> iris species? Their relative distance in terms of proximity within the
> space DIM1 and DIM2?
>
> How to choose for the k parameter (number of dimensions for the scaling
> coordinates)?
>
> And finally how would you explain the plot in simple terms?
>
> Thank you for any feedback
> Best regards
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attach...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] names error message

2013-12-02 Thread Julie Royster
Hello wise R folks,

I ran a job to combine 2 dataframes using rbind.
I received this error message that the names were not the same

Error in match.names(clabs,names(xi)): names do not match previous names

BUT when I entered this statement

Identical (names(data1[[1]]),names(data2[[2]]) ) 

R responded TRUE to this query, indicating the names are identical

So I am baffled.  visually checking each dataset using str they look the
same, and R says they are the same when queried,
But I still get the error when I give this command 

newname <- rbind (data1,data2)  

Any ideas?
THANKS!
Julie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reconvert binary matrix back to original numeric?

2013-12-02 Thread arun
Hi,
I couldn't reproduce the first part.
Lines1 <- readLines(textConnection("2 5 7 11
1 2 5
5 7 10 12 13")) 

 Max1 <- max(as.numeric(unlist(strsplit(Lines1," "
t(sapply(strsplit(Lines1," "), function(x) {x1<- as.numeric(x); x2 <- 
numeric(Max1); x2[x1]<- 1; x2}))

#or

mat1<- as.matrix(read.table(text=Lines1,header=FALSE,fill=TRUE))
indx <- cbind(as.vector(t(row(mat1))),as.vector(t(mat1)))
indx1 <- indx[!is.na(indx[,2]),]
Binary <- matrix(0,nrow(mat1),max(mat1,na.rm=TRUE))
Binary[indx1] <- 1

apply(!!Binary,1,which)
A.K.






Hi 
When you have data in text file has different length, for example: 
2 5 7 11 
1 2 5 
5 7 10 12 13 

Then you convert them to binary by: 
BinaryI<- as(data, "matrix") 
And you got: 
0 1 0 0 1 0 1 0 0 0 1 0 0 
1 1 0 0 1 0 0 0 0 0 0 0 0 
0 0 0 0 1 0 1 0 0 1 0 1 1 

How to convert them back to 
2 5 7 11 
1 2 5 
5 7 10 12 13 

I tried as.numeric and it didn’t work, and I tried that also with 
as(unlist(mydata), "numeric") 

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to reconvert binary matrix back to original numeric?

2013-12-02 Thread arun




Hi,
I couldn't reproduce the first part.
Lines1 <- readLines(textConnection("2 5 7 11
1 2 5
5 7 10 12 13")) 

 Max1 <- max(as.numeric(unlist(strsplit(Lines1," "
t(sapply(strsplit(Lines1," "), function(x) {x1<- as.numeric(x); x2 <- 
numeric(Max1); x2[x1]<- 1; x2}))

#or

mat1<- as.matrix(read.table(text=Lines1,header=FALSE,fill=TRUE))
indx <- cbind(as.vector(t(row(mat1))),as.vector(t(mat1)))
indx1 <- indx[!is.na(indx[,2]),]
Binary <- matrix(0,nrow(mat1),max(mat1,na.rm=TRUE))
Binary[indx1] <- 1

apply(!!Binary,1,which)
A.K.


Hi 
When you have data in text file has different length, for example: 
2 5 7 11 
1 2 5 
5 7 10 12 13 

Then you convert them to binary by: 
BinaryI<- as(data, "matrix") 
And you got: 
0 1 0 0 1 0 1 0 0 0 1 0 0 
1 1 0 0 1 0 0 0 0 0 0 0 0 
0 0 0 0 1 0 1 0 0 1 0 1 1 

How to convert them back to 
2 5 7 11 
1 2 5 
5 7 10 12 13 

I tried as.numeric and it didn’t work, and I tried that also with 
as(unlist(mydata), "numeric") 

thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] names error message

2013-12-02 Thread Sarah Goslee
Hi Julie,

On Mon, Dec 2, 2013 at 3:38 PM, Julie Royster  wrote:
> Hello wise R folks,
>
> I ran a job to combine 2 dataframes using rbind.
> I received this error message that the names were not the same
>
> Error in match.names(clabs,names(xi)): names do not match previous names
>
> BUT when I entered this statement
>
> Identical (names(data1[[1]]),names(data2[[2]]) )

I'm not sure what you think you're comparing here:
> data1 <- data.frame(a=1:3, b=4:6, c=7:9)
> data2 <- data.frame(A=11:13, B=14:16, C=17:19)
> row.names(data1) <- c("A", "B", "C")

> identical (names(data1[[1]]),names(data2[[2]]) )
[1] TRUE

I initially thought you were comparing row names to column names, but
that's also not right. Instead you're taking the first list element of
data1 and comparing its name to that of the second list element of
data2, but extracting them in a way that removes the names:

> data1[[1]]
[1] 1 2 3
> data2[[2]]
[1] 14 15 16

> names(data1[[1]])
NULL
> names(data2[[2]])
NULL

Compared to:
> names(data1[1])
[1] "a"
> names(data2[2])
[1] "B"

Instead, you need to look at the column names of each, which
are most conveniently accessed with
colnames(data1) and colnames(data2)

> R responded TRUE to this query, indicating the names are identical
>
> So I am baffled.  visually checking each dataset using str they look the
> same, and R says they are the same when queried,
> But I still get the error when I give this command
>
> newname <- rbind (data1,data2)
>
> Any ideas?

Best idea of all: provide a reproducible example, because otherwise
there's no way to tell.

dput(head(data1))

and

dput(head(data2))

and paste that into your email.

Sarha

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] names error message

2013-12-02 Thread William Dunlap
> I ran a job to combine 2 dataframes using rbind.
> I received this error message that the names were not the same
> Error in match.names(clabs,names(xi)): names do not match previous names

The column names of the data.frames given to rbind must all be
permutations of one another.   E.g.,
> rbind(data.frame(A=1:3,B=11:13), data.frame(B=14:17, A=4:7))
  A  B
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
but not
> rbind(data.frame(A=1:3,B=11:13), data.frame(B=14:17, C=104:107))
Error in match.names(clabs, names(xi)) : 
  names do not match previous names

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Julie Royster
> Sent: Monday, December 02, 2013 12:39 PM
> To: r-help@r-project.org
> Subject: [R] names error message
> 
> Hello wise R folks,
> 
> I ran a job to combine 2 dataframes using rbind.
> I received this error message that the names were not the same
> 
> Error in match.names(clabs,names(xi)): names do not match previous names
> 
> BUT when I entered this statement
> 
> Identical (names(data1[[1]]),names(data2[[2]]) )
> 
> R responded TRUE to this query, indicating the names are identical
> 
> So I am baffled.  visually checking each dataset using str they look the
> same, and R says they are the same when queried,
> But I still get the error when I give this command
> 
> newname <- rbind (data1,data2)
> 
> Any ideas?
> THANKS!
> Julie
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bill
Thanks to all, still a bit of a mystery. It is for the graphing I guess but
not sure why. I don't think there is too much sense to saying that Tuesday
precedes Wednesday but there could, as some of you suggest, be
circumstances where this is useful.


On Mon, Dec 2, 2013 at 12:00 PM, Bert Gunter  wrote:

> Did you not see Mark Leeds's post?
>
> The OP apparently did not really mean R's "ordered factors" as
> produced by the R ordered() constructor; rather, he meant "factors
> with levels ordered differently than the default", for which Rich's
> answer was apropos. Mine -- and now yours -- in which we wrongly
> assumed the OP knew what he was saying by "ordered factor", was not.
>
> -- Bert
>
> On Mon, Dec 2, 2013 at 11:51 AM, Robert Baer  wrote:
> >
> > On 12/2/2013 9:35 AM, Bert Gunter wrote:
> >
> > Not true, Rich.
> >
> > The point about alphabetical ordering explains why the author likely
> > explicitly set the levels for the factor, though.
> >
> > As to why ordered factors, we may never know, but one possible
> explanation
> > is that at some point he was going to use statistics where he wanted to
> use
> > polynomial contrasts. See
> >
> > options()$contrasts
> >
> > Note that the default contrast type differs for normal factors and
> ordered
> > factors.
> >
> >
> >
> > z <-factor(letters[1:3],lev=letters[3:1])
> > sort(z)
> >
> > [1] c b a
> > Levels: c b a
> >
> > What you say is true only for the **default** sort order.
> >
> > (Although maybe the code author didn't realize this either)
> >
> > -- Bert
> >
> >
> > On Mon, Dec 2, 2013 at 7:24 AM, Richard M. Heiberger 
> wrote:
> >
> > If days of the week is not an Ordered Factor, then it will be sorted
> > alphabetically.
> > Fr Mo Sa Su Th Tu We
> >
> > Rich
> >
> > On Mon, Dec 2, 2013 at 6:24 AM, Bill  wrote:
> >
> > I am reading the code below. It acts on a csv file called dodgers.csv
> with
> > the following variables.
> >
> >
> > print(str(dodgers))  # check the structure of the data frame
> >
> > 'data.frame':   81 obs. of  12 variables:
> >  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1
> 1
> > 1 ...
> >  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
> >  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014
> 26345
> > 44807 ...
> >  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6
> 7
> > 1 ...
> >  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11
> 11
> > 3 3 3 10 ...
> >  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
> >  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> > ...
> >  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
> >  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> >  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> >  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
> >  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> > NULL
> >
> > I don't understand why the author of the code decided to make the factor
> > days_of_week into an ordered factor. Anyone know why this should be done?
> > Thank you.
> >
> > Here is the code:
> >
> > # Predictive Model for Los Angeles Dodgers Promotion and Attendance
> >
> > library(car)  # special functions for linear regression
> > library(lattice)  # graphics package
> >
> > # read in data and create a data frame called dodgers
> > dodgers <- read.csv("dodgers.csv")
> > print(str(dodgers))  # check the structure of the data frame
> >
> > # define an ordered day-of-week variable
> > # for plots and data summaries
> > dodgers$ordered_day_of_week <- with(data=dodgers,
> >   ifelse ((day_of_week == "Monday"),1,
> >   ifelse ((day_of_week == "Tuesday"),2,
> >   ifelse ((day_of_week == "Wednesday"),3,
> >   ifelse ((day_of_week == "Thursday"),4,
> >   ifelse ((day_of_week == "Friday"),5,
> >   ifelse ((day_of_week == "Saturday"),6,7)))
> > dodgers$ordered_day_of_week <- factor(dodgers$ordered_day_of_week,
> > levels=1:7,
> > labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
> >
> > # exploratory data analysis with standard graphics: attendance by day of
> > week
> > with(data=dodgers,plot(ordered_day_of_week, attend/1000,
> > xlab = "Day of Week", ylab = "Attendance (thousands)",
> > col = "violet", las = 1))
> >
> > # when do the Dodgers use bobblehead promotions
> > with(dodgers, table(bobblehead,ordered_day_of_week)) # bobbleheads on
> > Tuesday
> >
> > # define an ordered month variable
> > # for plots and data summaries
> > dodgers$ordered_month <- with(data=dodgers,
> >   ifelse ((month == "APR"),4,
> >   ifelse ((month == "MAY"),5,
> >   ifelse ((month == "JUN"),6,
> >   ifelse ((month == "JUL"),7,
> >   ifelse ((month == "AUG"),8,
> >   ifelse ((month == "SEP"),9,10)))
> > dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10,
> > labels = c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
> >
> > # explorator

[R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Bill
ifelse ((day_of_week == "Monday"),1,
  ifelse ((day_of_week == "Tuesday"),2,
  ifelse ((day_of_week == "Wednesday"),3,
  ifelse ((day_of_week == "Thursday"),4,
  ifelse ((day_of_week == "Friday"),5,
  ifelse ((day_of_week == "Saturday"),6,7)))


  In code like the above, day_of_week is a vector and so day_of_week ==
"Monday" will result in a boolean vector. Suppose day_of_week is Monday,
Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
True,False,False,False. I think that ifelse will test the first element and
it will generate a 1. At this point it will not have run day_of_week ==
"Tuesday" yet. Then it will test the second element of day_of_week and it
will be false and this will cause it to evaluate day_of_week == "Tuesday".
My question would be, does the evaluation of day_of_week == "Tuesday"
result in the generation of an entire boolean vector (which would be in
this case False,False,False,True) or does the ifelse "manage the indexing"
so that it only tests the second element of the original vector (which is
Thursday) and for that matter does it therefore not even bother to generate
the first boolean vector I mentioned above (True,False,False,False) but
rather just checks the first element?
  Not sure if I have explained this well but if you understand I would
appreciate a reply.
  Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Duncan Murdoch

On 13-12-02 7:33 PM, Bill wrote:

ifelse ((day_of_week == "Monday"),1,
   ifelse ((day_of_week == "Tuesday"),2,
   ifelse ((day_of_week == "Wednesday"),3,
   ifelse ((day_of_week == "Thursday"),4,
   ifelse ((day_of_week == "Friday"),5,
   ifelse ((day_of_week == "Saturday"),6,7)))


   In code like the above, day_of_week is a vector and so day_of_week ==
"Monday" will result in a boolean vector. Suppose day_of_week is Monday,
Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
True,False,False,False. I think that ifelse will test the first element and
it will generate a 1. At this point it will not have run day_of_week ==
"Tuesday" yet. Then it will test the second element of day_of_week and it
will be false and this will cause it to evaluate day_of_week == "Tuesday".
My question would be, does the evaluation of day_of_week == "Tuesday"
result in the generation of an entire boolean vector (which would be in
this case False,False,False,True) or does the ifelse "manage the indexing"
so that it only tests the second element of the original vector (which is
Thursday) and for that matter does it therefore not even bother to generate
the first boolean vector I mentioned above (True,False,False,False) but
rather just checks the first element?
   Not sure if I have explained this well but if you understand I would
appreciate a reply.


See the help for the function.  If any element of the test is true, the 
full first vector will be evaluated.  If any element is false, the 
second one will be evaluated.  There are no shortcuts of the kind you 
describe.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Bill
It seems so inefficient. I mean the whole first vector will be evaluated.
Then if the second if is run the whole vector will be evaluated again. Then
if the next if is run the whole vector will be evaluted again. And so on.
And this could be only to test the first element (if it is false for each
if statement). Then this would be repeated again and again. Is that really
the way it works? Or am I not thinking clearly?


On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch wrote:

> On 13-12-02 7:33 PM, Bill wrote:
>
>> ifelse ((day_of_week == "Monday"),1,
>>ifelse ((day_of_week == "Tuesday"),2,
>>ifelse ((day_of_week == "Wednesday"),3,
>>ifelse ((day_of_week == "Thursday"),4,
>>ifelse ((day_of_week == "Friday"),5,
>>ifelse ((day_of_week == "Saturday"),6,7)))
>>
>>
>>In code like the above, day_of_week is a vector and so day_of_week ==
>> "Monday" will result in a boolean vector. Suppose day_of_week is Monday,
>> Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
>> True,False,False,False. I think that ifelse will test the first element
>> and
>> it will generate a 1. At this point it will not have run day_of_week ==
>> "Tuesday" yet. Then it will test the second element of day_of_week and it
>> will be false and this will cause it to evaluate day_of_week == "Tuesday".
>> My question would be, does the evaluation of day_of_week == "Tuesday"
>> result in the generation of an entire boolean vector (which would be in
>> this case False,False,False,True) or does the ifelse "manage the indexing"
>> so that it only tests the second element of the original vector (which is
>> Thursday) and for that matter does it therefore not even bother to
>> generate
>> the first boolean vector I mentioned above (True,False,False,False) but
>> rather just checks the first element?
>>Not sure if I have explained this well but if you understand I would
>> appreciate a reply.
>>
>
> See the help for the function.  If any element of the test is true, the
> full first vector will be evaluated.  If any element is false, the second
> one will be evaluated.  There are no shortcuts of the kind you describe.
>
> Duncan Murdoch
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Duncan Murdoch

On 13-12-02 7:49 PM, Bill wrote:

It seems so inefficient. I mean the whole first vector will be
evaluated. Then if the second if is run the whole vector will be
evaluated again. Then if the next if is run the whole vector will be
evaluted again. And so on. And this could be only to test the first
element (if it is false for each if statement). Then this would be
repeated again and again. Is that really the way it works? Or am I not
thinking clearly?


Read the manual.

Duncan Murdoch




On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote:

On 13-12-02 7:33 PM, Bill wrote:

ifelse ((day_of_week == "Monday"),1,
ifelse ((day_of_week == "Tuesday"),2,
ifelse ((day_of_week == "Wednesday"),3,
ifelse ((day_of_week == "Thursday"),4,
ifelse ((day_of_week == "Friday"),5,
ifelse ((day_of_week == "Saturday"),6,7)))


In code like the above, day_of_week is a vector and so
day_of_week ==
"Monday" will result in a boolean vector. Suppose day_of_week is
Monday,
Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
True,False,False,False. I think that ifelse will test the first
element and
it will generate a 1. At this point it will not have run
day_of_week ==
"Tuesday" yet. Then it will test the second element of
day_of_week and it
will be false and this will cause it to evaluate day_of_week ==
"Tuesday".
My question would be, does the evaluation of day_of_week ==
"Tuesday"
result in the generation of an entire boolean vector (which
would be in
this case False,False,False,True) or does the ifelse "manage the
indexing"
so that it only tests the second element of the original vector
(which is
Thursday) and for that matter does it therefore not even bother
to generate
the first boolean vector I mentioned above
(True,False,False,False) but
rather just checks the first element?
Not sure if I have explained this well but if you understand
I would
appreciate a reply.


See the help for the function.  If any element of the test is true,
the full first vector will be evaluated.  If any element is false,
the second one will be evaluated.  There are no shortcuts of the
kind you describe.

Duncan Murdoch




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Bill
Hi. Thanks. Which part? I looked at the below but I don't think it clearly
addresses the nested ifelse situation.


ifelse {base}R DocumentationConditional Element Selection Description

ifelse returns a value with the same shape as test which is filled with
elements selected from either yes or no depending on whether the element of
test is TRUE or FALSE.
Usage

ifelse(test, yes, no)

Argumentstest

an object which can be coerced to logical mode.
yes

return values for true elements of test.
no

return values for false elements of test.
Details

If yes or no are too short, their elements are recycled. yes will be
evaluated if and only if any element of test is true, and analogously for no
.

Missing values in test give missing values in the result.
Value

A vector of the same length and attributes (including dimensions and "class")
as test and data values from the values of yes or no. The mode of the
answer will be coerced from logical to accommodate first any values taken
from yes and then any values taken from no.
Warning

The mode of the result may depend on the value of test (see the examples),
and the class attribute (see
oldClass)
of the result is taken from testand may be inappropriate for the values
selected from yes and no.

Sometimes it is better to use a construction such as

  (tmp <- yes; tmp[!test] <- no[!test]; tmp)

, possibly extended to handle missing values in test. References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) *The New S Language*.
Wadsworth & Brooks/Cole.
See Also

if .
Examples

x <- c(6:-4)
sqrt(x)  #- gives warning
sqrt(ifelse(x >= 0, x, NA))  # no warning

## Note: the following also gives the warning !
ifelse(x >= 0, sqrt(x), NA)

## example of different return modes:
yes <- 1:3
no <- pi^(0:3)
typeof(ifelse(NA, yes, no))# logical
typeof(ifelse(TRUE, yes, no))  # integer
typeof(ifelse(FALSE, yes, no)) # double



On Mon, Dec 2, 2013 at 5:09 PM, Duncan Murdoch wrote:

> On 13-12-02 7:49 PM, Bill wrote:
>
>> It seems so inefficient. I mean the whole first vector will be
>> evaluated. Then if the second if is run the whole vector will be
>> evaluated again. Then if the next if is run the whole vector will be
>> evaluted again. And so on. And this could be only to test the first
>> element (if it is false for each if statement). Then this would be
>> repeated again and again. Is that really the way it works? Or am I not
>> thinking clearly?
>>
>
> Read the manual.
>
> Duncan Murdoch
>
>
>>
>> On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch > > wrote:
>>
>> On 13-12-02 7:33 PM, Bill wrote:
>>
>> ifelse ((day_of_week == "Monday"),1,
>> ifelse ((day_of_week == "Tuesday"),2,
>> ifelse ((day_of_week == "Wednesday"),3,
>> ifelse ((day_of_week == "Thursday"),4,
>> ifelse ((day_of_week == "Friday"),5,
>> ifelse ((day_of_week == "Saturday"),6,7)))
>>
>>
>> In code like the above, day_of_week is a vector and so
>> day_of_week ==
>> "Monday" will result in a boolean vector. Suppose day_of_week is
>> Monday,
>> Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
>> True,False,False,False. I think that ifelse will test the first
>> element and
>> it will generate a 1. At this point it will not have run
>> day_of_week ==
>> "Tuesday" yet. Then it will test the second element of
>> day_of_week and it
>> will be false and this will cause it to evaluate day_of_week ==
>> "Tuesday".
>> My question would be, does the evaluation of day_of_week ==
>> "Tuesday"
>> result in the generation of an entire boolean vector (which
>> would be in
>> this case False,False,False,True) or does the ifelse "manage the
>> indexing"
>> so that it only tests the second element of the original vector
>> (which is
>> Thursday) and for that matter does it therefore not even bother
>> to generate
>> the first boolean vector I mentioned above
>> (True,False,False,False) but
>> rather just checks the first element?
>> Not sure if I have explained this well but if you understand
>> I would
>> appreciate a reply.
>>
>>
>> See the help for the function.  If any element of the test is true,
>> the full first vector will be evaluated.  If any element is false,
>> the second one will be evaluated.  There are no shortcuts of the
>> kind you describe.
>>
>> Duncan Murdoch
>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commente

Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread William Dunlap
> It seems so inefficient.

But ifelse knows nothing about the expressions given
as its second and third arguments -- it only sees their
values after they are evaluated.  Even if it could see the
expressions, it would not be able to assume that f(x[i])
is the same as f(x)[i] or things like
   ifelse(x>0, cumsum(x), cumsum(-x))
would not work.

You can avoid the computing all of f(x) and then extracting
a few elements from it by doing something like
   x <- c("Wednesday", "Monday", "Wednesday")
   z1 <- character(length(x))
   z1[x=="Monday"] <- "Mon"
   z1[x=="Tuesday"] <- "Tue"
   z1[x=="Wednesday"] <- "Wed"
or
   LongDayNames <- c("Monday","Tuesday","Wednesday")
   ShortDayNames <- c("Mon", "Tue", "Wed")
   z2 <- character(length(x))
   for(i in seq_along(LongDayNames)) {
  z2[x==LongDayNames[i]] <- ShortDayNames[i]
   }

To avoid the repeated x==value[i] you can use match(x, values).
   z3 <- ShortDayNames[match(x, LongDayNames)]

z1, z2, and z3 are identical  character vectors.

Or, you can use factors.
   > factor(x, levels=LongDayNames, labels=ShortDayNames)
   [1] Wed Mon Wed
   Levels: Mon Tue Wed

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Bill
> Sent: Monday, December 02, 2013 4:50 PM
> To: Duncan Murdoch
> Cc: r-help@r-project.org
> Subject: Re: [R] ifelse -does it "manage the indexing"?
> 
> It seems so inefficient. I mean the whole first vector will be evaluated.
> Then if the second if is run the whole vector will be evaluated again. Then
> if the next if is run the whole vector will be evaluted again. And so on.
> And this could be only to test the first element (if it is false for each
> if statement). Then this would be repeated again and again. Is that really
> the way it works? Or am I not thinking clearly?
> 
> 
> On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch
> wrote:
> 
> > On 13-12-02 7:33 PM, Bill wrote:
> >
> >> ifelse ((day_of_week == "Monday"),1,
> >>ifelse ((day_of_week == "Tuesday"),2,
> >>ifelse ((day_of_week == "Wednesday"),3,
> >>ifelse ((day_of_week == "Thursday"),4,
> >>ifelse ((day_of_week == "Friday"),5,
> >>ifelse ((day_of_week == "Saturday"),6,7)))
> >>
> >>
> >>In code like the above, day_of_week is a vector and so day_of_week ==
> >> "Monday" will result in a boolean vector. Suppose day_of_week is Monday,
> >> Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
> >> True,False,False,False. I think that ifelse will test the first element
> >> and
> >> it will generate a 1. At this point it will not have run day_of_week ==
> >> "Tuesday" yet. Then it will test the second element of day_of_week and it
> >> will be false and this will cause it to evaluate day_of_week == "Tuesday".
> >> My question would be, does the evaluation of day_of_week == "Tuesday"
> >> result in the generation of an entire boolean vector (which would be in
> >> this case False,False,False,True) or does the ifelse "manage the indexing"
> >> so that it only tests the second element of the original vector (which is
> >> Thursday) and for that matter does it therefore not even bother to
> >> generate
> >> the first boolean vector I mentioned above (True,False,False,False) but
> >> rather just checks the first element?
> >>Not sure if I have explained this well but if you understand I would
> >> appreciate a reply.
> >>
> >
> > See the help for the function.  If any element of the test is true, the
> > full first vector will be evaluated.  If any element is false, the second
> > one will be evaluated.  There are no shortcuts of the kind you describe.
> >
> > Duncan Murdoch
> >
> >
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Intepreting lm() results with factor

2013-12-02 Thread David Gwenzi
Dear all

I have observations done in 4 different classes and the between classes
*variance* is too high that I decided to run a model without pooling the
*variance*. I used the following code first :
   model<-lm(y~x+factor(class))
and got the following output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 52.41405   17.38161   3.015  0.00658 **
x0.276790.07387   3.747  0.00119 **
factor(class)2  92.68083   32.26645   2.872  0.00912 **
factor(class)3 197.82029   33.24916   5.950 6.63e-06 ***
factor(class)4 105.61266   55.18373   1.914  0.06937 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 43.07 on 21 degrees of freedom
Multiple R-squared:  0.9206,Adjusted R-squared:  0.9055
F-statistic: 60.91 on 4 and 21 DF,  p-value: 2.976e-11

My understanding of this output is that class 1 is used as a baseline
(constant) and each other class's p values means for example the dependent
value in class 2 is significantly different from that of class 1.
Now I ran the model again, but without using a constant i.e
model<-lm(y~x+factor(class)-1)
and got the following output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x0.276790.07387   3.747  0.00119 **
factor(class)1  52.41405   17.38161   3.015  0.00658 **
factor(class)2 145.09488   39.42651   3.680  0.00139 **
factor(class)3 250.23434   40.61189   6.162 4.11e-06 ***
factor(class)4 158.02672   64.09549   2.465  0.02238 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 43.07 on 21 degrees of freedom
Multiple R-squared:  0.9801,Adjusted R-squared:  0.9754
F-statistic: 207.1 on 5 and 21 DF,  p-value: < 2.2e-16

Can somebody please tell me how to interpret this one now? what do the
classes' P values mean ? Do they merely show if they significantly
contribute to the model or whether they are significantly different from
the overall mean or not? Does it mean if one class had a p value > 0.05 it
would mean the observations from that class are not significantly
contributing to the model?

Thanks in advance

David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] big matrix in r

2013-12-02 Thread krys22
i have a problem with big matix, in fact, after th matrix's creation many
rows ans colomuns are invisble because the big dimension of the matix 
could you help me to get may complete matrix, have you any fonctions or any
solution to resolve this problem




--
View this message in context: 
http://r.789695.n4.nabble.com/big-matrix-in-r-tp4681524.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting contrasts

2013-12-02 Thread David C. Howell
I have been having trouble understanding the difference between 
"contrasts(Group) <- contr.sum" and options(contrasts = 
c("contr.sum","contr.poly"). They both seem to say that they have done 
what I want, but only the latter works.


The reason why the question arises is tha,t using Fox's Anova, it is 
important to use sum contrasts. So I wrote the following code, with the 
result commented in.



# Start R from scratch in case old contrasts are left over  --They aren't
#removed with rm()
# Note that the data are balanced, so almost any solution SHOULD give 
the same

#result.
Eysenck <- 
read.table("http://www.uvm.edu/~dhowell/methods7/DataFiles/Tab13-2.dat";,

 header = TRUE)
Eysenck$subj <- factor(1:100)
Eysenck$Condition <- factor(Eysenck$Condition, levels = 1:5, labels = 
c("Counting",

 "Rhyming", "Adjective", "Imagery","Intention"))
Eysenck$Age <- factor(Eysenck$Age, levels = 1:2, labels = c("Old","Young"))
attach(Eysenck)
result <- anova(aov(Recall~Condition*Age, data = Eysenck))
print(result)  # This is the correct result
#Analysis of Variance Table
#
  #  Response: Recall
  # Df  Sum Sq Mean Sq F value Pr(>F)
  #  Condition   4 1514.94  378.73 47.1911 < 2.2e-16 ***
  #  Age 1  240.25  240.25 29.9356 
3.981e-07 ***

  #  Condition:Age   4  190.30   47.57  5.9279 0.0002793 ***
  #  Residuals 90  722.30 8.03
getOption("contrasts")
  #  unordered ordered
  #  "contr.treatment" "contr.poly"
#Note that these are the default treatment contrasts--bad, bad, but OK 
here.


# Leave the contrasts alone for now
library(car)
resultsCar1 <- lm(Recall~Age*Condition, data = Eysenck )
type2 <- Anova(resultsCar1, type = "II") #This is OK, but the next 
is very  wrong

print(type2)
  #Anova Table (Type II tests)
  #type2 <- Anova(resultsCar1, type = "III")
  #Response: Recallprint(type2)
  #   Sum Sq Df F value Pr(>F)
  #Age 240.25  1 29.9356 3.981e-07 ***
  #Condition 1514.94  4 47.1911 < 2.2e-16 ***
  #Age:Condition 190.30  4  5.9279 0.0002793 ***
  #Residuals  722.30 90

resultsCar2 <- lm(Recall~Age*Condition, data = Eysenck )
type3 <- Anova(resultsCar1, type = "III")
print(type3)   #  This is still wrong --Fox says I need sum contrasts
#Anova Table (Type III tests)
  #  contrasts(Condition) <- contr.sum
  #  Response: Recallcontrasts(Age) <- contr.sum
  #Sum Sq Df F value Pr(>F)
  #  (Intercept)490.00  1 61.0550  9.85e-12 ***
  #  Age 1.25  1  0.1558 0.6940313
  #  Condition  351.52  4 10.9500  2.80e-07 ***# Hmmm! why 
do we still have treatment contrasts

  #  Age:Condition 190.30  4  5.9279 0.0002793 ***## No LUCK!!
  #  Residuals722.30 90
getOption("contrasts")
  #  unordered ordered
  #"contr.treatment" "contr.poly"
contrasts(Condition) <- contr.sum
contrasts(Age) <- contr.sum
contrasts(Condition); contrasts(Age)
   #Yup, we see sum contrasts!
  #  [,1] [,2] [,3] [,4]
  #Counting 100 0
  #Rhyming  010 0
  #Adjective001 0
  #Imagery  000 1
  #Intention   -1   -1   -1 -1
  # [,1]
  #Old 1
  #Young -1
# BUT!!
resultsCar3 <- lm(Recall~Age*Condition, data = Eysenck )
type3 <- Anova(resultsCar3, type = "III")
print(type3)
  #Anova Table (Type III tests)
#
  #Response: Recall
  #   Sum Sq Df F value Pr(>F)
  #(Intercept) 490.00  1 61.0550  9.85e-12 ***
  #Age 1.25  1  0.1558 0.6940313
  #Condition  351.52  4 10.9500  2.80e-07 ***
  #Age:Condition 190.30  4  5.9279 0.0002793 ***
  #Residuals722.30 90

##Damn! We are still wrong even though the above shows sum contrasts

## So we do it another way
options(contrasts = c("contr.sum","contr.poly"))
getOption("contrasts")
 #[1] "contr.sum" "contr.poly"
resutsCar4 <- lm(Recall~Age*Condition, data = Eysenck )
type4 <- Anova(resultsCar4, type = "III")
  #print(type4)  # Now we're back where we should be
  #Anova Table (Type III tests)
#
  #Response: Recall
  #  Sum Sq Df   F value Pr(>F)
  #(Intercept)   13479.2  1 1679.5361 < 2.2e-16 ***
  #Age   240.2  1   29.9356 3.981e-07 ***
  #Condition  1514.9  4   47.1911 < 2.2e-16 ***
  #Age:Condition  190.3  45.9279 0.0002793 ***
  #Residuals   722.3 90
## Now it works just fine.
##So what is the difference between setting contrasts individuall and 
setting

##them through the options?

I get similar results if I use drop1, but perhaps that is what Fox did 
also.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.o

Re: [R] Setting contrasts

2013-12-02 Thread Joshua Wiley
Hi David,

You attach the dataset, which creates a copy of it.  You set the contrasts
on the copy, but in your models, you reference the dataset explicitly.  You
should set the contrasts directly in the data (and for your sake and your
students, would encourage not using attach() ).

contrasts(Eysenck$Condition) <- contr.sum

Setting the global option works because it applies to everything R does, so
even the dataset where contrasts were not set.

Cheers,

Josh




On Mon, Dec 2, 2013 at 5:41 PM, David C. Howell wrote:

> I have been having trouble understanding the difference between
> "contrasts(Group) <- contr.sum" and options(contrasts =
> c("contr.sum","contr.poly"). They both seem to say that they have done what
> I want, but only the latter works.
>
> The reason why the question arises is tha,t using Fox's Anova, it is
> important to use sum contrasts. So I wrote the following code, with the
> result commented in.
>
>
> # Start R from scratch in case old contrasts are left over  --They aren't
> #removed with rm()
> # Note that the data are balanced, so almost any solution SHOULD give the
> same
> #result.
> Eysenck <- read.table("http://www.uvm.edu/~dhowell/methods7/
> DataFiles/Tab13-2.dat",
>  header = TRUE)
> Eysenck$subj <- factor(1:100)
> Eysenck$Condition <- factor(Eysenck$Condition, levels = 1:5, labels =
> c("Counting",
>  "Rhyming", "Adjective", "Imagery","Intention"))
> Eysenck$Age <- factor(Eysenck$Age, levels = 1:2, labels = c("Old","Young"))
> attach(Eysenck)
> result <- anova(aov(Recall~Condition*Age, data = Eysenck))
> print(result)  # This is the correct result
> #Analysis of Variance Table
> #
>   #  Response: Recall
>   # Df  Sum Sq Mean Sq F value Pr(>F)
>   #  Condition   4 1514.94  378.73 47.1911 < 2.2e-16 ***
>   #  Age 1  240.25  240.25 29.9356 3.981e-07
> ***
>   #  Condition:Age   4  190.30   47.57  5.9279 0.0002793 ***
>   #  Residuals 90 722.30 8.03
> getOption("contrasts")
>   #  unordered ordered
>   #  "contr.treatment" "contr.poly"
> #Note that these are the default treatment contrasts--bad, bad, but OK
> here.
>
> # Leave the contrasts alone for now
> library(car)
> resultsCar1 <- lm(Recall~Age*Condition, data = Eysenck )
> type2 <- Anova(resultsCar1, type = "II") #This is OK, but the next is
> very  wrong
> print(type2)
>   #Anova Table (Type II tests)
>   #type2 <- Anova(resultsCar1, type = "III")
>   #Response: Recallprint(type2)
>   #   Sum Sq Df F value Pr(>F)
>   #Age 240.25  1 29.9356 3.981e-07 ***
>   #Condition 1514.94  4 47.1911 < 2.2e-16 ***
>   #Age:Condition 190.30  4  5.9279 0.0002793 ***
>   #Residuals  722.30 90
>
> resultsCar2 <- lm(Recall~Age*Condition, data = Eysenck )
> type3 <- Anova(resultsCar1, type = "III")
> print(type3)   #  This is still wrong --Fox says I need sum contrasts
> #Anova Table (Type III tests)
>   #  contrasts(Condition) <- contr.sum
>   #  Response: Recallcontrasts(Age) <- contr.sum
>   #Sum Sq Df F value Pr(>F)
>   #  (Intercept)490.00  1 61.0550  9.85e-12 ***
>   #  Age 1.25  1  0.1558 0.6940313
>   #  Condition  351.52  4 10.9500  2.80e-07 ***# Hmmm! why do
> we still have treatment contrasts
>   #  Age:Condition 190.30  4  5.9279 0.0002793 ***## No LUCK!!
>   #  Residuals722.30 90
> getOption("contrasts")
>   #  unordered ordered
>   #"contr.treatment" "contr.poly"
> contrasts(Condition) <- contr.sum
> contrasts(Age) <- contr.sum
> contrasts(Condition); contrasts(Age)
>#Yup, we see sum contrasts!
>   #  [,1] [,2] [,3] [,4]
>   #Counting 100 0
>   #Rhyming  010 0
>   #Adjective001 0
>   #Imagery  000 1
>   #Intention   -1   -1   -1 -1
>   # [,1]
>   #Old 1
>   #Young -1
> # BUT!!
> resultsCar3 <- lm(Recall~Age*Condition, data = Eysenck )
> type3 <- Anova(resultsCar3, type = "III")
> print(type3)
>   #Anova Table (Type III tests)
> #
>   #Response: Recall
>   #   Sum Sq Df F value Pr(>F)
>   #(Intercept) 490.00  1 61.0550  9.85e-12 ***
>   #Age 1.25  1  0.1558 0.6940313
>   #Condition  351.52  4 10.9500  2.80e-07 ***
>   #Age:Condition 190.30  4  5.9279 0.0002793 ***
>   #Residuals722.30 90
>
> ##Damn! We are still wrong even though the above shows sum contrasts
>
> ## So we do it another way
> options(contrasts = c("contr.sum","contr.poly"))
> getOption("contrasts")
>  #[1] "contr.sum" "contr.poly"
> resutsCar4 <- lm(Recall~Age*Condition, data = Eysenck )
> type4 <- Anova(resultsCar4, type = "III")
>   #print(type4)  # Now we're back where we should be
>   #Anova Table (Type III tests)
> #
>   #Response

Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Bill
Duncan,

Your solution seems so much simpler. I was reading The Art of R Programming
and it says the following:


g=c("M","F","F","I","M","M","F",

ifelse (g == "M", 1, ifelse (g == "F", 2, 3))

[1] 1 2 2 3 1 1 2

What actually happens in that nested ifelse () ? Let’s take a careful look.
First, for the sake of concreteness, let’s find what the formal argument
names are in the function ifelse ():

args(ifelse)

function (test, yes, no)

NULL

Remember, for each element of test that is true, the function evaluates to
the corresponding element in yes. Similarly, if test [11 is false, the
function evaluates to no [1] . All values so generated are returned together
in a vector.
**
MY COMMENT: so the first vector seems to be conditional seems to be
evaluated in full to the full boolean vector.

In our case here, R will execute the outer ifelse () call first, in which
test
is g == "M", and yes is 1 (recycled); no will (later) be the result of
executing
ifelse (g=="F", 2, 3). Now since test [11 is true, we generate yes [1]
which is
1. So, the first element of the return value of our outer call will be 1.

Next R will evaluate test[2] . That is false, so R needs to find no[2] . R
now
needs to execute the inner ifelse () call. It hasn’t done so before, because
it hasn’t needed it until now. R uses the principle of lazy evaluation,
meaning that an expression is not computed until it is needed.
***
MY COMMENT: The above sentence seems to be relevant.
**

R will now evaluate ifelse (g=="F", 2, 3), yielding (3,2,2,3,3,3,2); this
is no
for the outer ifelse () call, so the latter’s second return element will be
 the second element of (3,2,2,3,3,3,2), which is 2.
***
MY COMMENT: So this is evaluated at least this once completely.
*


When the outer ifelse () call gets to test [41 , it will see that value to
be
false and thus will return no [41 . Since R had already computed no, it has
the value needed, which is 3.
***
MY COMMENT: I am not sure if the implecation here is that the indexing is
managed or not.


On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap  wrote:

> > It seems so inefficient.
>
> But ifelse knows nothing about the expressions given
> as its second and third arguments -- it only sees their
> values after they are evaluated.  Even if it could see the
> expressions, it would not be able to assume that f(x[i])
> is the same as f(x)[i] or things like
>ifelse(x>0, cumsum(x), cumsum(-x))
> would not work.
>
> You can avoid the computing all of f(x) and then extracting
> a few elements from it by doing something like
>x <- c("Wednesday", "Monday", "Wednesday")
>z1 <- character(length(x))
>z1[x=="Monday"] <- "Mon"
>z1[x=="Tuesday"] <- "Tue"
>z1[x=="Wednesday"] <- "Wed"
> or
>LongDayNames <- c("Monday","Tuesday","Wednesday")
>ShortDayNames <- c("Mon", "Tue", "Wed")
>z2 <- character(length(x))
>for(i in seq_along(LongDayNames)) {
>   z2[x==LongDayNames[i]] <- ShortDayNames[i]
>}
>
> To avoid the repeated x==value[i] you can use match(x, values).
>z3 <- ShortDayNames[match(x, LongDayNames)]
>
> z1, z2, and z3 are identical  character vectors.
>
> Or, you can use factors.
>> factor(x, levels=LongDayNames, labels=ShortDayNames)
>[1] Wed Mon Wed
>Levels: Mon Tue Wed
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Bill
> > Sent: Monday, December 02, 2013 4:50 PM
> > To: Duncan Murdoch
> > Cc: r-help@r-project.org
> > Subject: Re: [R] ifelse -does it "manage the indexing"?
> >
> > It seems so inefficient. I mean the whole first vector will be evaluated.
> > Then if the second if is run the whole vector will be evaluated again.
> Then
> > if the next if is run the whole vector will be evaluted again. And so on.
> > And this could be only to test the first element (if it is false for each
> > if statement). Then this would be repeated again and again. Is that
> really
> > the way it works? Or am I not thinking clearly?
> >
> >
> > On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch
> > wrote:
> >
> > > On 13-12-02 7:33 PM, Bill wrote:
> > >
> > >> ifelse ((day_of_week == "Monday"),1,
> > >>ifelse ((day_of_week == "Tuesday"),2,
> > >>ifelse ((day_of_week == "Wednesday"),3,
> > >>ifelse ((day_of_week == "Thursday"),4,
> > >>ifelse ((day_of_week == "Friday"),5,
> > >>ifelse ((day_of_week == "Saturday"),6,7)))
> > >>
> > >>
> > >>In code like the above, day_of_week is a vector and so day_of_week
> ==
> > >> "Monday" will result in a boolean vector. Suppose day_of_week is
> Monday,
> > >> Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
> > >> True,False,False,False. I think that ifelse will test the first
> element
> > >> and
> > >> it will generate a 1. At this point it will not have run day_of_week
> ==
> > >> "Tue

Re: [R] ifelse -does it "manage the indexing"?

2013-12-02 Thread Bill
Duncan,

Your solution seems so much simpler. I was reading The Art of R Programming
and it says the following:

g=c("M","F","F","I","M","M","F",

ifelse (g == "M", 1, ifelse (g == "F", 2, 3))

[1] 1 2 2 3 1 1 2

What actually happens in that nested ifelse () ? Let’s take a careful look.
First, for the sake of concreteness, let’s find what the formal argument
names are in the function ifelse ():

args(ifelse)

function (test, yes, no)

NULL

Remember, for each element of test that is true, the function evaluates to
the corresponding element in yes. Similarly, if test [1] is false, the
function evaluates to no [1] . All values so generated are returned together
in a vector.
**
MY COMMENT: so the first vector seems to be conditional seems to be
evaluated in full to the full boolean vector.

In our case here, R will execute the outer ifelse () call first, in which
test
is g == "M", and yes is 1 (recycled); no will (later) be the result of
executing
ifelse (g=="F", 2, 3). Now since test [11 is true, we generate yes [1]
which is
1. So, the first element of the return value of our outer call will be 1.

Next R will evaluate test[2] . That is false, so R needs to find no[2] . R
now
needs to execute the inner ifelse () call. It hasn’t done so before, because
it hasn’t needed it until now. R uses the principle of lazy evaluation,
meaning that an expression is not computed until it is needed.
***
MY COMMENT: The above sentence seems to be relevant.
**

R will now evaluate ifelse (g=="F", 2, 3), yielding (3,2,2,3,3,3,2); this
is no
for the outer ifelse () call, so the latter’s second return element will be
 the second element of (3,2,2,3,3,3,2), which is 2.
***
MY COMMENT: So this is evaluated at least this once completely.
*


When the outer ifelse () call gets to test [41 , it will see that value to
be
false and thus will return no [41 . Since R had already computed no, it has
the value needed, which is 3.
***
MY COMMENT: I am not sure if the implecation here is that the indexing is
managed or not.



On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap  wrote:

> > It seems so inefficient.
>
> But ifelse knows nothing about the expressions given
> as its second and third arguments -- it only sees their
> values after they are evaluated.  Even if it could see the
> expressions, it would not be able to assume that f(x[i])
> is the same as f(x)[i] or things like
>ifelse(x>0, cumsum(x), cumsum(-x))
> would not work.
>
> You can avoid the computing all of f(x) and then extracting
> a few elements from it by doing something like
>x <- c("Wednesday", "Monday", "Wednesday")
>z1 <- character(length(x))
>z1[x=="Monday"] <- "Mon"
>z1[x=="Tuesday"] <- "Tue"
>z1[x=="Wednesday"] <- "Wed"
> or
>LongDayNames <- c("Monday","Tuesday","Wednesday")
>ShortDayNames <- c("Mon", "Tue", "Wed")
>z2 <- character(length(x))
>for(i in seq_along(LongDayNames)) {
>   z2[x==LongDayNames[i]] <- ShortDayNames[i]
>}
>
> To avoid the repeated x==value[i] you can use match(x, values).
>z3 <- ShortDayNames[match(x, LongDayNames)]
>
> z1, z2, and z3 are identical  character vectors.
>
> Or, you can use factors.
>> factor(x, levels=LongDayNames, labels=ShortDayNames)
>[1] Wed Mon Wed
>Levels: Mon Tue Wed
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Bill
> > Sent: Monday, December 02, 2013 4:50 PM
> > To: Duncan Murdoch
> > Cc: r-help@r-project.org
> > Subject: Re: [R] ifelse -does it "manage the indexing"?
> >
> > It seems so inefficient. I mean the whole first vector will be evaluated.
> > Then if the second if is run the whole vector will be evaluated again.
> Then
> > if the next if is run the whole vector will be evaluted again. And so on.
> > And this could be only to test the first element (if it is false for each
> > if statement). Then this would be repeated again and again. Is that
> really
> > the way it works? Or am I not thinking clearly?
> >
> >
> > On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch
> > wrote:
> >
> > > On 13-12-02 7:33 PM, Bill wrote:
> > >
> > >> ifelse ((day_of_week == "Monday"),1,
> > >>ifelse ((day_of_week == "Tuesday"),2,
> > >>ifelse ((day_of_week == "Wednesday"),3,
> > >>ifelse ((day_of_week == "Thursday"),4,
> > >>ifelse ((day_of_week == "Friday"),5,
> > >>ifelse ((day_of_week == "Saturday"),6,7)))
> > >>
> > >>
> > >>In code like the above, day_of_week is a vector and so day_of_week
> ==
> > >> "Monday" will result in a boolean vector. Suppose day_of_week is
> Monday,
> > >> Thursday, Friday, Tuesday. So day_of_week == "Monday" will be
> > >> True,False,False,False. I think that ifelse will test the first
> element
> > >> and
> > >> it will generate a 1. At this point it will not have run day_of_week
> ==
> > >> "Tue

Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Duncan Mackay
Hi Bill

eg

> colours =  1:8
> coloursf =  factor(1:8)
> colourso =  ordered(1:8)
> str(coloursf)
 Factor w/ 8 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8
> str(colourso)
 Ord.factor w/ 8 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8

coloursf2 <- factor(1:8, levels = 8:1)
str(coloursf2)

Duncan

Duncan
Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au


ordered used in 
used in MASS::polr and GEE for polytomous logistic regression

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Bill
Sent: Monday, 2 December 2013 21:24
To: r-help@r-project.org
Subject: [R] why change days of the week from a factor to an ordered factor?

I am reading the code below. It acts on a csv file called dodgers.csv with
the following variables.


> print(str(dodgers))  # check the structure of the data frame
'data.frame':   81 obs. of  12 variables:
 $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
1 ...
 $ day: int  10 11 12 13 14 15 23 24 25 27 ...
 $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
44807 ...
 $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
1 ...
 $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
3 3 3 10 ...
 $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
 $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
...
 $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
 $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
 $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
 $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
NULL
>

I don't understand why the author of the code decided to make the factor
days_of_week into an ordered factor. Anyone know why this should be done?
Thank you.

Here is the code:

# Predictive Model for Los Angeles Dodgers Promotion and Attendance

library(car)  # special functions for linear regression
library(lattice)  # graphics package

# read in data and create a data frame called dodgers dodgers <-
read.csv("dodgers.csv")
print(str(dodgers))  # check the structure of the data frame

# define an ordered day-of-week variable # for plots and data summaries
dodgers$ordered_day_of_week <- with(data=dodgers,
  ifelse ((day_of_week == "Monday"),1,
  ifelse ((day_of_week == "Tuesday"),2,
  ifelse ((day_of_week == "Wednesday"),3,
  ifelse ((day_of_week == "Thursday"),4,
  ifelse ((day_of_week == "Friday"),5,
  ifelse ((day_of_week == "Saturday"),6,7))) dodgers$ordered_day_of_week
<- factor(dodgers$ordered_day_of_week,
levels=1:7,
labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))

# exploratory data analysis with standard graphics: attendance by day of
week with(data=dodgers,plot(ordered_day_of_week, attend/1000, xlab = "Day of
Week", ylab = "Attendance (thousands)", col = "violet", las = 1))

# when do the Dodgers use bobblehead promotions with(dodgers,
table(bobblehead,ordered_day_of_week)) # bobbleheads on Tuesday

# define an ordered month variable
# for plots and data summaries
dodgers$ordered_month <- with(data=dodgers,
  ifelse ((month == "APR"),4,
  ifelse ((month == "MAY"),5,
  ifelse ((month == "JUN"),6,
  ifelse ((month == "JUL"),7,
  ifelse ((month == "AUG"),8,
  ifelse ((month == "SEP"),9,10)))
dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10, labels =
c("April", "May", "June", "July", "Aug", "Sept", "Oct"))

# exploratory data analysis with standard R graphics: attendance by month
with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month", ylab =
"Attendance (thousands)", col = "light blue", las = 1))

# exploratory data analysis displaying many variables # looking at
attendance and conditioning on day/night # the skies and whether or not
fireworks are displayed
library(lattice) # used for plotting
# let us prepare a graphical summary of the dodgers data group.labels <-
c("No Fireworks","Fireworks") group.symbols <- c(21,24) group.colors <-
c("black","black") group.fill <- c("black","red")
xyplot(attend/1000 ~ temp | skies + day_night,
data = dodgers, groups = fireworks, pch = group.symbols,
aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
layout = c(2, 2), type = c("p","g"),
strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
xlab = "Temperature (Degrees Fahrenheit)",
ylab = "Attendance (thousands)",
key = list(space = "top",
text = list(rev(group.labels),col = rev(group.colors)),
points = list(pch = rev(group.symbols), col = rev(group.colors),
fill = rev(group.fill

# attendance by opponent and day/night game group.labels <- c("Day","Night")
group.symbols <- c(1,20) group.symbols.size <- c(2,2.75) bwplot(opponent ~
attend/1000, data = dod

Re: [R] why change days of the week from a factor to an ordered factor?

2013-12-02 Thread Bill
Duncan,
Thanks. Why doesn't
coloursf2 <- factor(1:8, levels = 8:1)

give an ordering when you do str(coloursf2) like
"8"<"7"<"6" ...

Bill


On Mon, Dec 2, 2013 at 6:29 PM, Duncan Mackay  wrote:

> Hi Bill
>
> eg
>
> > colours =  1:8
> > coloursf =  factor(1:8)
> > colourso =  ordered(1:8)
> > str(coloursf)
>  Factor w/ 8 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8
> > str(colourso)
>  Ord.factor w/ 8 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8
>
> coloursf2 <- factor(1:8, levels = 8:1)
> str(coloursf2)
>
> Duncan
>
> Duncan
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mac...@northnet.com.au
>
>
> ordered used in
> used in MASS::polr and GEE for polytomous logistic regression
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On
> Behalf Of Bill
> Sent: Monday, 2 December 2013 21:24
> To: r-help@r-project.org
> Subject: [R] why change days of the week from a factor to an ordered
> factor?
>
> I am reading the code below. It acts on a csv file called dodgers.csv with
> the following variables.
>
>
> > print(str(dodgers))  # check the structure of the data frame
> 'data.frame':   81 obs. of  12 variables:
>  $ month  : Factor w/ 7 levels "APR","AUG","JUL",..: 1 1 1 1 1 1 1 1 1
> 1 ...
>  $ day: int  10 11 12 13 14 15 23 24 25 27 ...
>  $ attend : int  56000 29729 28328 31601 46549 38359 26376 44014 26345
> 44807 ...
>  $ day_of_week: Factor w/ 7 levels "Friday","Monday",..: 6 7 5 1 3 4 2 6 7
> 1 ...
>  $ opponent   : Factor w/ 17 levels "Angels","Astros",..: 13 13 13 11 11 11
> 3 3 3 10 ...
>  $ temp   : int  67 58 57 54 57 65 60 63 64 66 ...
>  $ skies  : Factor w/ 2 levels "Clear ","Cloudy": 1 2 2 2 2 1 2 2 2 1
> ...
>  $ day_night  : Factor w/ 2 levels "Day","Night": 1 2 2 2 2 1 2 2 2 2 ...
>  $ cap: Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ shirt  : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
>  $ fireworks  : Factor w/ 2 levels "NO","YES": 1 1 1 2 1 1 1 1 1 2 ...
>  $ bobblehead : Factor w/ 2 levels "NO","YES": 1 1 1 1 1 1 1 1 1 1 ...
> NULL
> >
>
> I don't understand why the author of the code decided to make the factor
> days_of_week into an ordered factor. Anyone know why this should be done?
> Thank you.
>
> Here is the code:
>
> # Predictive Model for Los Angeles Dodgers Promotion and Attendance
>
> library(car)  # special functions for linear regression
> library(lattice)  # graphics package
>
> # read in data and create a data frame called dodgers dodgers <-
> read.csv("dodgers.csv")
> print(str(dodgers))  # check the structure of the data frame
>
> # define an ordered day-of-week variable # for plots and data summaries
> dodgers$ordered_day_of_week <- with(data=dodgers,
>   ifelse ((day_of_week == "Monday"),1,
>   ifelse ((day_of_week == "Tuesday"),2,
>   ifelse ((day_of_week == "Wednesday"),3,
>   ifelse ((day_of_week == "Thursday"),4,
>   ifelse ((day_of_week == "Friday"),5,
>   ifelse ((day_of_week == "Saturday"),6,7)))
> dodgers$ordered_day_of_week
> <- factor(dodgers$ordered_day_of_week,
> levels=1:7,
> labels=c("Mon", "Tue", "Wed", "Thur", "Fri", "Sat", "Sun"))
>
> # exploratory data analysis with standard graphics: attendance by day of
> week with(data=dodgers,plot(ordered_day_of_week, attend/1000, xlab = "Day
> of
> Week", ylab = "Attendance (thousands)", col = "violet", las = 1))
>
> # when do the Dodgers use bobblehead promotions with(dodgers,
> table(bobblehead,ordered_day_of_week)) # bobbleheads on Tuesday
>
> # define an ordered month variable
> # for plots and data summaries
> dodgers$ordered_month <- with(data=dodgers,
>   ifelse ((month == "APR"),4,
>   ifelse ((month == "MAY"),5,
>   ifelse ((month == "JUN"),6,
>   ifelse ((month == "JUL"),7,
>   ifelse ((month == "AUG"),8,
>   ifelse ((month == "SEP"),9,10)))
> dodgers$ordered_month <- factor(dodgers$ordered_month, levels=4:10, labels
> =
> c("April", "May", "June", "July", "Aug", "Sept", "Oct"))
>
> # exploratory data analysis with standard R graphics: attendance by month
> with(data=dodgers,plot(ordered_month,attend/1000, xlab = "Month", ylab =
> "Attendance (thousands)", col = "light blue", las = 1))
>
> # exploratory data analysis displaying many variables # looking at
> attendance and conditioning on day/night # the skies and whether or not
> fireworks are displayed
> library(lattice) # used for plotting
> # let us prepare a graphical summary of the dodgers data group.labels <-
> c("No Fireworks","Fireworks") group.symbols <- c(21,24) group.colors <-
> c("black","black") group.fill <- c("black","red")
> xyplot(attend/1000 ~ temp | skies + day_night,
> data = dodgers, groups = fireworks, pch = group.symbols,
> aspect = 1, cex = 1.5, col = group.colors, fill = group.fill,
> layout = c(2, 2), type = c("p","g"),
> strip=strip.custom(strip.levels=TRUE,strip.names=FALSE, style=1),
> xlab = "Temperature (Degre

[R] What is the easiest way to interpolate vertical values on a square section of a nearly-planar 3D surface

2013-12-02 Thread Mercutio Florid
I want to map out a mostly flat area of land, 300 meters on a side.  


I want to make (x,y,z) triples where x and y vary between -150 and 150 and 
there is just one z value.


Eventually I will try to use graphics to actually draw this, but my first 
problem is that I need to get 90601 values by interpolating just 13 actual 
measurements.  The measurements are currently unsorted, which might cause 
errors with some functions, and they are in a matrix that looks like this:
  X    Y Value
1    20  135   105
2  -127   69   106
3   -98   47   107
4   -39   69   105
5    49   47   105
6   108   69   107
7    -9    3   106
8   -39    3   106
9  -127  -63   105
10  -39  -41   108
11  -39 -107   106
12   79  -63   107
13   20 -129   107


The syntax for the output seems pretty easy:
x_coord<-seq(from=-150,to=150)
y_coord<-seq(from=-150,to=150)
planebreadth=301
spaceArray<-array(0,c(planebreadth, planebreadth,1))

But what I need to do is somehow interpolate 90601 values into spaceArray, 
based on just 13 measurements.

I looked through some introductory R tutorials such as

cran.r-project.org/doc/manuals/R-intro.html

and I didn't see any examples that seemed to cover this kind of problem.



I did some web searches and there seem to be many, many ways to do 
interpolation.  There are packages like mgcv and DiceKriging.  There are 
various packages that mention splines in their descriptions, such as cobs.   

What is the easiest way to interpolate this kind of data?  Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating numbers with time-dependent upperbound

2013-12-02 Thread arun
Hi,
May be this helps:
m <- 0.01
 T <- 90
 t <- 0:T
set.seed(42)
res1 <- sapply(t,function(.t) runif(1,0,m*(T-.t)))

#or
 set.seed(42)
 res2 <- runif(91,rep(0,91),m*(rep(T,91)-t))
identical(res1,res2)
#[1] TRUE

A.K.


Hi, 

I need to generate a sequence that consists of random numbers. 
My first idea was runif with the condition m(T-t), but i couldn't 
achieve. 

Given: m=0.01 , T=90, t={0,1,...,T} 

Sequence: s={p_0, p_1,..., p_T}, where p_t is chosen from the time-dependent 
interval [0, m(T-t)]   


I appreciate if someone helps me.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.