[R] combine similar variables in chart

2012-11-15 Thread Marcella
I want to make a chart on variables which are taken in 2 different years on 2
the same locations, at different depths, where for every location the
species and the amount of species are determined.

The first chart I want to make is to see how much of every species is
collected in total in a bar or piechart.
Then I want to make a few charts with the species per location and per depth

Here is a subset of my dataset

Station Latitude(S) Longitude(E)Depth(m)Sa.Code Sp.Code N
UPG02   -5.122355036119.339401  4   UPG02_4 Amphis_lesson   398
UPG02   -5.122355036119.339401  4   UPG02_4 Amphis_radiat   58
UPG02   -5.122355036119.339401  4   UPG02_4 Penero_planat   3
UPG02   -5.122355036119.339401  4.8 UPG02_4.8   Sphero_sp   
5
UPG02   -5.122355036119.339401  4.8 UPG02_4.8   Amphis_lesson   
204
UPG02   -5.122355036119.339401  4.8 UPG02_4.8   Amphis_radiat   
18
UPG02   -5.122355036119.339401  9   UPG02_9 Sphero_sp   1
UPG02   -5.122355036119.339401  9   UPG02_9 Eponid_repand   2
UPG02   -5.122355036119.339401  9   UPG02_9 Planor_sp   7
UPG02   -5.122355036119.339401  15  UPG02_15Opercu_ammono   
1
UPG02   -5.122355036119.339401  15  UPG02_15Hetero_depres   
94
UPG02   -5.122355036119.339401  15  UPG02_15Amphis_lesson   
101
UPG02   -5.122355036119.339401  18  UPG02_18Sphero_sp   
3
UPG02   -5.122355036119.339401  18  UPG02_18Millio_sp   
1
UPG02   -5.122355036119.339401  18  UPG02_18Calcar_mayori   
56
UPG03   -5.136863021119.387289  3   UPG03_3 Millio_sp   2
UPG03   -5.136863021119.387289  3   UPG03_3 Sahuli_sp1  1
UPG03   -5.136863021119.387289  3   UPG03_3 Elphid_cratic   20
UPG03   -5.136863021119.387289  4   UPG03_4 Planor_sp   1
UPG03   -5.136863021119.387289  4   UPG03_4 Opercu_elegan   1
UPG03   -5.136863021119.387289  4   UPG03_4 Septot_sp   1
UPG03   -5.136863021119.387289  5   UPG03_5 Sphero_sp   1
UPG03   -5.136863021119.387289  5   UPG03_5 Millio_sp   1
UPG03   -5.136863021119.387289  5   UPG03_5 Ammoni_sp1  1
UPG03   -5.136863021119.387289  6   UPG03_6 Calcar_sp1  2
UPG03   -5.136863021119.387289  6   UPG03_6 Calcar_mayori   68
UPG03   -5.136863021119.387289  6   UPG03_6 Elphid_cratic   17
UPG03   -5.136863021119.387289  9   UPG03_9 Sphero_sp   2
UPG03   -5.136863021119.387289  9   UPG03_9 Calcar_mayori   323
UPG03   -5.136863021119.387289  9   UPG03_9 Calcar_spengl   2
UPG28   -5.122355036119.339401  5   UPG28_5 Penero_planat   8
UPG28   -5.122355036119.339401  5   UPG28_5 Amphis_latera   1
UPG28   -5.122355036119.339401  5   UPG28_5 Sorite_sp   1
UPG28   -5.122355036119.339401  8   UPG28_8 Bilocu_sp3  1
UPG28   -5.122355036119.339401  8   UPG28_8 Verteb_sp   1
UPG28   -5.122355036119.339401  8   UPG28_8 Penero_planat   7
UPG28   -5.122355036119.339401  11  UPG28_11Placop_c.f. 
1
UPG28   -5.122355036119.339401  11  UPG28_11Sphero_sp   
2
UPG28   -5.122355036119.339401  11  UPG28_11Siphon_siphon   
1
UPG28   -5.122355036119.339401  14  UPG28_14Sphero_sp   
1
UPG28   -5.122355036119.339401  14  UPG28_14Sorite_sp   
1
UPG28   -5.122355036119.339401  14  UPG28_14Spirol_hadaii   
1
UPG28   -5.122355036119.339401  17  UPG28_17Sphero_sp   
3
UPG28   -5.122355036119.339401  17  UPG28_17Siphon_siphon   
1
UPG28   -5.122355036119.339401  17  UPG28_17Sorite_sp   
1
UPG32   -5.136863021119.387289  4   UPG32_4 Pyrgo_striol1
UPG32   -5.136863021119.387289  4   UPG32_4 Ammoma_alveol   1
UPG32   -5.136863021119.387289  4   UPG32_4 Quinqu_netstr   1
UPG32   -5.136863021119.387289  6   UPG32_6 Neorot_calcar   1
UPG32   -5.136863021119.387289  6   UPG32_6 Quinqu_c.f.ag   1
UPG32   -5.136863021119.387289  6   UPG32_6 Amphis_lesson   50
UPG32   -5.136863021119.387289  8   UPG32_8 Spirol_sp3  1
UPG32   -5.136863021119.387289  8   UPG32_8 Triloc_tricar   1
UPG32   -5.136863021119.387289  8   UPG32_8 Amphis_lesson   34




--
View this message in context: 
http://r.789695.n4.nabble.com/combine-similar-variables-in-chart-tp4649581.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting g

[R] Nested Logit Model

2014-04-03 Thread Tim Marcella
Hi All,

I am using the package 'mlogit'

I am having trouble constructing a nested logit model. I am headed down
this path as I am violating the IIA assumption when running a multinomial
logistic regression.

I am analyzing the choice a seabird floating on the surface of the water
makes when approached by a ship. I have three choices defined for the
dependent variable 'act2'; fly, dive, or remain on the water (none).
Independent variables I hypothesize might affect the decision are
perpendicular distance to the ship's path, group size, location, distance
to shore, speed of the ship, number of ships in the area and wind speed.
All the variables are individual specific.

A non-nested mutinomial logit model will run fine using the following code
(data converted to long format already).

 > mod.1 <- mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) +
as.factor(sea) + avgknots + shore + as.factor(location),long_perp ,
reflevel = 'none')

After assessing that I am violating the IIA assumption I attempted to run a
nested logit model with the following code.

> nested.1 <- mlogit(act2 ~  0 | dist + as.factor(grp.bin) +
as.factor(ship) +
as.factor(sea) + avgknots + shore + as.factor(location),long_perp ,
reflevel = 'none',
nests = list(react = c("dive","fly"), remain = "none"), unscaled=TRUE)

I get the following error code.

"Error in solve.default(crossprod(attr(x, "gradi")[, !fixed])) :
  Lapack routine dgesv: system is exactly singular: U[6,6] = 0"

Does this indicate that my variables are col-linear?, overfitting? I have
tried to remove all but one variable and rerun and still get an error code.

Can anyone help me get over this hump or help suggest a different modeling
approach?

I am mainly interested in the probability of choosing to react (fly or
dive) or not, and then once a reaction has been made, which one is chosen
and how these decisions relate to perpendicular distance to the ship's path.

Thanks, Tim

-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Time interactions for coxph object

2014-04-07 Thread Tim Marcella
Hi All,

I am working in the 'survival' library and need to know how to code a
variable time interaction for left truncated data (not all subjects
followed from the start of the study). I therefor add in the entry time and
exit time variables into the formula.

Here is my basic formula

CSHR.shore.fly <- coxph(Surv(entry, exit, to == 1) ~ shore.cat, data
  glba.mod)

My variable shore.cat is violating the proportional hazards assumption so I
am trying to add in an interaction with time. Do I interact exit? entry? or
the range of the two?

Thanks, Tim

-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot mlogit object

2014-04-14 Thread Tim Marcella
Hi,

I cannot figure out how or if I even can plot the results from a nested
multinomial logit model. I am using the mlogit package.

Does anyone have a lead on any tutorials? Both of the vignettes are lacking
plotting instructions.

Thanks, Tim

-- 
Tim Marcella

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: using coord_trans for logit -> probability

2014-04-16 Thread Tim Marcella
I think all you have to do is add type="response" to your call for the
predictions.

Does this work for you

# get fitted values on the logit scale
pred <- data.frame(Arthritis,
   predict(arth.logistic, se.fit=TRUE,type="response"))

library(ggplot2)
library(scales)
# plot on logit scale
gg <- ggplot(pred, aes(x=Age, y=fit)) +
  geom_line(size = 2) + theme_bw() +
  geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
  ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
"transparent") +
  labs(x = "Age", y = "Log odds (Better)")
gg

-Tim


On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly  wrote:

> I'm trying to see if & how I can use coord_trans() with ggplot2 to
> transform the
> Y axis of a plot on the logit scale to the probability scale, as opposed
> to  recalculating
> everything "manually" and constructing a new plot.
> Here is a simple example of the 'base' plot I'd like to transform:
>
> data(Arthritis, package="vcdExtra")
> Arthritis$Better <- as.numeric(Arthritis$Improved > "None")
> arth.logistic <- glm(Better ~ Age, data=Arthritis, family=binomial)
>
> # get fitted values on the logit scale
> pred <- data.frame(Arthritis,
>predict(arth.logistic, se.fit=TRUE))
> library(ggplot2)
> library(scales)
> # plot on logit scale
> gg <- ggplot(pred, aes(x=Age, y=fit)) +
>   geom_line(size = 2) + theme_bw() +
>   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
>   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
> "transparent") +
>   labs(x = "Age", y = "Log odds (Better)")
> gg
>
> Things I've tried that don't work:
>
> > gg + coord_trans(ytrans="logis")
> Error in get(as.character(FUN), mode = "function", envir = envir) :
>   object 'logis_trans' of mode 'function' was not found
> >
> > gg + coord_trans(ytrans=probability_trans("logis"))
> Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
> In addition: Warning message:
> In qfun(x, ...) : NaNs produced
> >
>
> Doing what I want "manually":
>
> # doing it manually
> pred2 <- within(pred, {
>  prob  <- plogis(fit)
>  lower <- plogis(fit - 1.96 * se.fit)
>  upper <- plogis(fit + 1.96 * se.fit)
>  })
>
>
> gg2 <- ggplot(pred2, aes(x=Age, y=prob)) +
>   geom_line(size = 2) + theme_bw() +
>   geom_ribbon(aes(ymin = lower,
>   ymax = upper), alpha = 0.2,  color = "transparent") +
>   labs(x = "Age", y = "Probability (Better)")
> gg2
>
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca
> Professor, Psychology Dept. & Chair, Quantitative Methods
> York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
> 4700 Keele StreetWeb:   http://www.datavis.ca
> Toronto, ONT  M3J 1P3 CANADA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: using coord_trans for logit -> probability

2014-04-16 Thread Tim Marcella
Sorry I jumped the gun. That does not provide you with the same plot as gg2
that you are aiming for.

-T


On Wed, Apr 16, 2014 at 7:37 PM, Tim Marcella  wrote:

> I think all you have to do is add type="response" to your call for the
> predictions.
>
> Does this work for you
>
> # get fitted values on the logit scale
> pred <- data.frame(Arthritis,
>predict(arth.logistic, se.fit=TRUE,type="response"))
>
> library(ggplot2)
> library(scales)
> # plot on logit scale
> gg <- ggplot(pred, aes(x=Age, y=fit)) +
>   geom_line(size = 2) + theme_bw() +
>   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
>   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
> "transparent") +
>   labs(x = "Age", y = "Log odds (Better)")
> gg
>
> -Tim
>
>
> On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly wrote:
>
>> I'm trying to see if & how I can use coord_trans() with ggplot2 to
>> transform the
>> Y axis of a plot on the logit scale to the probability scale, as opposed
>> to  recalculating
>> everything "manually" and constructing a new plot.
>> Here is a simple example of the 'base' plot I'd like to transform:
>>
>> data(Arthritis, package="vcdExtra")
>> Arthritis$Better <- as.numeric(Arthritis$Improved > "None")
>> arth.logistic <- glm(Better ~ Age, data=Arthritis, family=binomial)
>>
>> # get fitted values on the logit scale
>> pred <- data.frame(Arthritis,
>>predict(arth.logistic, se.fit=TRUE))
>> library(ggplot2)
>> library(scales)
>> # plot on logit scale
>> gg <- ggplot(pred, aes(x=Age, y=fit)) +
>>   geom_line(size = 2) + theme_bw() +
>>   geom_ribbon(aes(ymin = fit - 1.96 * se.fit,
>>   ymax = fit + 1.96 * se.fit,), alpha = 0.2,  color =
>> "transparent") +
>>   labs(x = "Age", y = "Log odds (Better)")
>> gg
>>
>> Things I've tried that don't work:
>>
>> > gg + coord_trans(ytrans="logis")
>> Error in get(as.character(FUN), mode = "function", envir = envir) :
>>   object 'logis_trans' of mode 'function' was not found
>> >
>> > gg + coord_trans(ytrans=probability_trans("logis"))
>> Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed
>> In addition: Warning message:
>> In qfun(x, ...) : NaNs produced
>> >
>>
>> Doing what I want "manually":
>>
>> # doing it manually
>> pred2 <- within(pred, {
>>  prob  <- plogis(fit)
>>  lower <- plogis(fit - 1.96 * se.fit)
>>  upper <- plogis(fit + 1.96 * se.fit)
>>  })
>>
>>
>> gg2 <- ggplot(pred2, aes(x=Age, y=prob)) +
>>   geom_line(size = 2) + theme_bw() +
>>   geom_ribbon(aes(ymin = lower,
>>   ymax = upper), alpha = 0.2,  color = "transparent") +
>>   labs(x = "Age", y = "Probability (Better)")
>> gg2
>>
>>
>>
>> --
>> Michael Friendly Email: friendly AT yorku DOT ca
>> Professor, Psychology Dept. & Chair, Quantitative Methods
>> York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
>> 4700 Keele StreetWeb:   http://www.datavis.ca
>> Toronto, ONT  M3J 1P3 CANADA
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Tim Marcella
> 508.498.2989
> timmarce...@gmail.com
>



-- 
Tim Marcella
508.498.2989
timmarce...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting list names within a looped function

2012-11-15 Thread Marcella Meerkerk
No this doesn't seem like what I want.

As you can see in the dataset sometimes there are the same species on
different locations. First I want to make a chart on how many of a single
species there are in this dataset and preferrably for all species in one
chart.

Second I want to make such a chart per Location or Station as called in the
data set and third such a chart per Sa.Code

Gr,
Marcella

2012/11/15 Berend Hasselman 

>
> On 15-11-2012, at 11:14, Gaj Stan (BIGCAT) wrote:
>
> > Hello all,
> >
> > I have the following problem:
> >
> > 1) A list was defined as 'a'
> >
> > a <- list("var1"=c(100:1), "var2"=c(1:100), "var3"=rnorm(100))
> >
> > 2) a function 'foo' was defined that extracts the variable name assigned
> to x using the deparse(substitute()) functionality. This name will then be
> used within the function to generate specific output files, etc.
> >
> > foo <- function(x) {
> >  print( deparse(substitute(x)) )
> > }
> >
> > However, I am currently interested in looping through all list variables
> and extract the list variable name from within the function. The current
> loop (see below) will result in
> >
> > for(i in 1:length(a)) {
> >  foo(a[[i]])
> > }
> > [1] "a[[i]]"
> >
> > which actually does what I expected of deparse(substitute(x)), but is
> not what I wanted. I would like to end up with something like
> >
> > [1] "var1"
> > [1] "var2"
> > [1] "var3"
> >
> > or
> >
> > [1] "a[[\"var1\"]]"
> > [1] "a[[\"var2\"]]"
> > [1] "a[[\"var3\"]]"
> >
> > Keep in mind that x has to be a matrix, and not a list. This to keep the
> function as general as possible.
> >
> > Does anyone have any idea on how to tackle this? Is
> deparse(substitute(x)) here the best way to go? Are there alternatives?
>
> I'm not sure if I understand what you want but will this give you what you
> seem to want:
>
> names(a)
>
> Berend
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating RMSE in R from hurdle regression object

2014-03-12 Thread Tim Marcella
Hi,

My data is characterized by many zeros (82%) and overdispersion. I have
chosen to model with hurdle regression (pscl package) with a negative
binomial distribution for the count data. In an effort to validate the
model I would like to calculate the RMSE of the predicted vs. the observed
values. From my reading I understand that this is the calculated on the raw
residuals generated from the model output. This is the formula I used

H1.RMSE <- sqrt(mean(H1$residuals^2)) # Where H1 is my fitted hurdle
model

I get 46.7 as the RMSE. This seems high to me based on the model results.
Assuming my formula and my understanding of RMSE is correct (and please
correct me if I am wrong) I question whether this is an appropriate use of
validation for this particular structure of model. The hurdle model
correctly predicts all of my zeros. The predictions I get from the fitted
model are all values greater than zero. From my readings I understand that
the predictions from the fitted hurdle model are means generated for the
particular covariate environment based on the model coefficients. If this
is truly the case it does not make sense to compare these means to the
observations. This will generate large residuals (only 18% of the
observations contain counts greater than 0, while the predicted counts all
exceed 0). It seems like comparing apples to oranges. Other correlative
tests (Pearson's r, Spearman's p) would seem to be comparing the mean
predicted value for particular covariate to the observed which again is
heavily dominated by zeros.

Any tips on how best to validate hurdle models in R?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Negative binomial models and censored observations

2014-03-13 Thread Tim Marcella
Hi,

I am working with hurdle models in the pscl package to model zero inflated
overdispersed count data and want to incorporate censored observations into
the equation. 33% of the observed positive count data is right censored,
i.e. subject lost to follow up during the duration of the study. Can this
be accounted for in the hurdle() function?

Thanks, Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Coding for segmented regression within a hurdle model

2014-03-15 Thread Tim Marcella
Hi,

I am using a two part hurdle model to account for zero inflation and
overdispersion in my count data. I would like to account for a segmented or
breakpoint relationship in the binomial logistic hurdle model and pass
these results onto the count model (negative binomial).

Using the segemented package I have determined that my data supports one
breakpoint at 3.85. The slope to this point is significant and will affect
the presence of zeros in a linear fashion. The slope > 3.85 is
non-significant and estimated to not help predict the presence of zeros in
the data (threshold effect). Here are the results from this model

Estimated Break-Point(s):
   Est. St.Err
 3.853  1.372

t value for the gap-variable(s) V:  0

Meaningful coefficients of the linear terms:
   Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.2750 0.3556  -0.774   0.4392
approach_km -0.4520 0.2184  -2.069   0.0385 *
sea2 0.3627 0.2280   1.591   0.1117
U1.approach_km   0.4543 0.2188   2.076   NA

U1.approach_km is the estimate for the second slope. The actual estimated
slope for the section section is the difference between this value and the
approach_km value (0.0023).

I think that I have found a way to "maually" code this into the hurdle
model as follows

hurdle.fit <- hurdle(tot_f ~  x1 + x2 + x3 | approach_km +
I(pmax(approach_km-3.849,0)) + sea )

When I look at the estimated coefficients from the "manual" code it gives
the same values. However, the std.errors are estimated lower.

Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.274410.29347  -0.9350.350
approach_km -0.452610.09993  -4.529 5.92e-06 ***
I(pmax(approach_km - 3.849, 0))  0.454860.10723   4.242 2.22e-05 ***
sea2 0.362710.22803   1.5910.112

Question # 1: Does the hurdle equation use the standard errors from the
zero model when building the count predictions? If no then I guess I would
not have to worry about this and can just report the original std.errors
and associated p values from the segemented object in the pub.
Question # 2: If the count model uses the std.errors, how can I reformulate
this equation to generate the original std.errors.

Thanks, Tim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.