[R] combine similar variables in chart
I want to make a chart on variables which are taken in 2 different years on 2 the same locations, at different depths, where for every location the species and the amount of species are determined. The first chart I want to make is to see how much of every species is collected in total in a bar or piechart. Then I want to make a few charts with the species per location and per depth Here is a subset of my dataset Station Latitude(S) Longitude(E)Depth(m)Sa.Code Sp.Code N UPG02 -5.122355036119.339401 4 UPG02_4 Amphis_lesson 398 UPG02 -5.122355036119.339401 4 UPG02_4 Amphis_radiat 58 UPG02 -5.122355036119.339401 4 UPG02_4 Penero_planat 3 UPG02 -5.122355036119.339401 4.8 UPG02_4.8 Sphero_sp 5 UPG02 -5.122355036119.339401 4.8 UPG02_4.8 Amphis_lesson 204 UPG02 -5.122355036119.339401 4.8 UPG02_4.8 Amphis_radiat 18 UPG02 -5.122355036119.339401 9 UPG02_9 Sphero_sp 1 UPG02 -5.122355036119.339401 9 UPG02_9 Eponid_repand 2 UPG02 -5.122355036119.339401 9 UPG02_9 Planor_sp 7 UPG02 -5.122355036119.339401 15 UPG02_15Opercu_ammono 1 UPG02 -5.122355036119.339401 15 UPG02_15Hetero_depres 94 UPG02 -5.122355036119.339401 15 UPG02_15Amphis_lesson 101 UPG02 -5.122355036119.339401 18 UPG02_18Sphero_sp 3 UPG02 -5.122355036119.339401 18 UPG02_18Millio_sp 1 UPG02 -5.122355036119.339401 18 UPG02_18Calcar_mayori 56 UPG03 -5.136863021119.387289 3 UPG03_3 Millio_sp 2 UPG03 -5.136863021119.387289 3 UPG03_3 Sahuli_sp1 1 UPG03 -5.136863021119.387289 3 UPG03_3 Elphid_cratic 20 UPG03 -5.136863021119.387289 4 UPG03_4 Planor_sp 1 UPG03 -5.136863021119.387289 4 UPG03_4 Opercu_elegan 1 UPG03 -5.136863021119.387289 4 UPG03_4 Septot_sp 1 UPG03 -5.136863021119.387289 5 UPG03_5 Sphero_sp 1 UPG03 -5.136863021119.387289 5 UPG03_5 Millio_sp 1 UPG03 -5.136863021119.387289 5 UPG03_5 Ammoni_sp1 1 UPG03 -5.136863021119.387289 6 UPG03_6 Calcar_sp1 2 UPG03 -5.136863021119.387289 6 UPG03_6 Calcar_mayori 68 UPG03 -5.136863021119.387289 6 UPG03_6 Elphid_cratic 17 UPG03 -5.136863021119.387289 9 UPG03_9 Sphero_sp 2 UPG03 -5.136863021119.387289 9 UPG03_9 Calcar_mayori 323 UPG03 -5.136863021119.387289 9 UPG03_9 Calcar_spengl 2 UPG28 -5.122355036119.339401 5 UPG28_5 Penero_planat 8 UPG28 -5.122355036119.339401 5 UPG28_5 Amphis_latera 1 UPG28 -5.122355036119.339401 5 UPG28_5 Sorite_sp 1 UPG28 -5.122355036119.339401 8 UPG28_8 Bilocu_sp3 1 UPG28 -5.122355036119.339401 8 UPG28_8 Verteb_sp 1 UPG28 -5.122355036119.339401 8 UPG28_8 Penero_planat 7 UPG28 -5.122355036119.339401 11 UPG28_11Placop_c.f. 1 UPG28 -5.122355036119.339401 11 UPG28_11Sphero_sp 2 UPG28 -5.122355036119.339401 11 UPG28_11Siphon_siphon 1 UPG28 -5.122355036119.339401 14 UPG28_14Sphero_sp 1 UPG28 -5.122355036119.339401 14 UPG28_14Sorite_sp 1 UPG28 -5.122355036119.339401 14 UPG28_14Spirol_hadaii 1 UPG28 -5.122355036119.339401 17 UPG28_17Sphero_sp 3 UPG28 -5.122355036119.339401 17 UPG28_17Siphon_siphon 1 UPG28 -5.122355036119.339401 17 UPG28_17Sorite_sp 1 UPG32 -5.136863021119.387289 4 UPG32_4 Pyrgo_striol1 UPG32 -5.136863021119.387289 4 UPG32_4 Ammoma_alveol 1 UPG32 -5.136863021119.387289 4 UPG32_4 Quinqu_netstr 1 UPG32 -5.136863021119.387289 6 UPG32_6 Neorot_calcar 1 UPG32 -5.136863021119.387289 6 UPG32_6 Quinqu_c.f.ag 1 UPG32 -5.136863021119.387289 6 UPG32_6 Amphis_lesson 50 UPG32 -5.136863021119.387289 8 UPG32_8 Spirol_sp3 1 UPG32 -5.136863021119.387289 8 UPG32_8 Triloc_tricar 1 UPG32 -5.136863021119.387289 8 UPG32_8 Amphis_lesson 34 -- View this message in context: http://r.789695.n4.nabble.com/combine-similar-variables-in-chart-tp4649581.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting g
[R] Nested Logit Model
Hi All, I am using the package 'mlogit' I am having trouble constructing a nested logit model. I am headed down this path as I am violating the IIA assumption when running a multinomial logistic regression. I am analyzing the choice a seabird floating on the surface of the water makes when approached by a ship. I have three choices defined for the dependent variable 'act2'; fly, dive, or remain on the water (none). Independent variables I hypothesize might affect the decision are perpendicular distance to the ship's path, group size, location, distance to shore, speed of the ship, number of ships in the area and wind speed. All the variables are individual specific. A non-nested mutinomial logit model will run fine using the following code (data converted to long format already). > mod.1 <- mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) + as.factor(sea) + avgknots + shore + as.factor(location),long_perp , reflevel = 'none') After assessing that I am violating the IIA assumption I attempted to run a nested logit model with the following code. > nested.1 <- mlogit(act2 ~ 0 | dist + as.factor(grp.bin) + as.factor(ship) + as.factor(sea) + avgknots + shore + as.factor(location),long_perp , reflevel = 'none', nests = list(react = c("dive","fly"), remain = "none"), unscaled=TRUE) I get the following error code. "Error in solve.default(crossprod(attr(x, "gradi")[, !fixed])) : Lapack routine dgesv: system is exactly singular: U[6,6] = 0" Does this indicate that my variables are col-linear?, overfitting? I have tried to remove all but one variable and rerun and still get an error code. Can anyone help me get over this hump or help suggest a different modeling approach? I am mainly interested in the probability of choosing to react (fly or dive) or not, and then once a reaction has been made, which one is chosen and how these decisions relate to perpendicular distance to the ship's path. Thanks, Tim -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time interactions for coxph object
Hi All, I am working in the 'survival' library and need to know how to code a variable time interaction for left truncated data (not all subjects followed from the start of the study). I therefor add in the entry time and exit time variables into the formula. Here is my basic formula CSHR.shore.fly <- coxph(Surv(entry, exit, to == 1) ~ shore.cat, data glba.mod) My variable shore.cat is violating the proportional hazards assumption so I am trying to add in an interaction with time. Do I interact exit? entry? or the range of the two? Thanks, Tim -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot mlogit object
Hi, I cannot figure out how or if I even can plot the results from a nested multinomial logit model. I am using the mlogit package. Does anyone have a lead on any tutorials? Both of the vignettes are lacking plotting instructions. Thanks, Tim -- Tim Marcella [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: using coord_trans for logit -> probability
I think all you have to do is add type="response" to your call for the predictions. Does this work for you # get fitted values on the logit scale pred <- data.frame(Arthritis, predict(arth.logistic, se.fit=TRUE,type="response")) library(ggplot2) library(scales) # plot on logit scale gg <- ggplot(pred, aes(x=Age, y=fit)) + geom_line(size = 2) + theme_bw() + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = "transparent") + labs(x = "Age", y = "Log odds (Better)") gg -Tim On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly wrote: > I'm trying to see if & how I can use coord_trans() with ggplot2 to > transform the > Y axis of a plot on the logit scale to the probability scale, as opposed > to recalculating > everything "manually" and constructing a new plot. > Here is a simple example of the 'base' plot I'd like to transform: > > data(Arthritis, package="vcdExtra") > Arthritis$Better <- as.numeric(Arthritis$Improved > "None") > arth.logistic <- glm(Better ~ Age, data=Arthritis, family=binomial) > > # get fitted values on the logit scale > pred <- data.frame(Arthritis, >predict(arth.logistic, se.fit=TRUE)) > library(ggplot2) > library(scales) > # plot on logit scale > gg <- ggplot(pred, aes(x=Age, y=fit)) + > geom_line(size = 2) + theme_bw() + > geom_ribbon(aes(ymin = fit - 1.96 * se.fit, > ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = > "transparent") + > labs(x = "Age", y = "Log odds (Better)") > gg > > Things I've tried that don't work: > > > gg + coord_trans(ytrans="logis") > Error in get(as.character(FUN), mode = "function", envir = envir) : > object 'logis_trans' of mode 'function' was not found > > > > gg + coord_trans(ytrans=probability_trans("logis")) > Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed > In addition: Warning message: > In qfun(x, ...) : NaNs produced > > > > Doing what I want "manually": > > # doing it manually > pred2 <- within(pred, { > prob <- plogis(fit) > lower <- plogis(fit - 1.96 * se.fit) > upper <- plogis(fit + 1.96 * se.fit) > }) > > > gg2 <- ggplot(pred2, aes(x=Age, y=prob)) + > geom_line(size = 2) + theme_bw() + > geom_ribbon(aes(ymin = lower, > ymax = upper), alpha = 0.2, color = "transparent") + > labs(x = "Age", y = "Probability (Better)") > gg2 > > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele StreetWeb: http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: using coord_trans for logit -> probability
Sorry I jumped the gun. That does not provide you with the same plot as gg2 that you are aiming for. -T On Wed, Apr 16, 2014 at 7:37 PM, Tim Marcella wrote: > I think all you have to do is add type="response" to your call for the > predictions. > > Does this work for you > > # get fitted values on the logit scale > pred <- data.frame(Arthritis, >predict(arth.logistic, se.fit=TRUE,type="response")) > > library(ggplot2) > library(scales) > # plot on logit scale > gg <- ggplot(pred, aes(x=Age, y=fit)) + > geom_line(size = 2) + theme_bw() + > geom_ribbon(aes(ymin = fit - 1.96 * se.fit, > ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = > "transparent") + > labs(x = "Age", y = "Log odds (Better)") > gg > > -Tim > > > On Wed, Apr 16, 2014 at 7:03 PM, Michael Friendly wrote: > >> I'm trying to see if & how I can use coord_trans() with ggplot2 to >> transform the >> Y axis of a plot on the logit scale to the probability scale, as opposed >> to recalculating >> everything "manually" and constructing a new plot. >> Here is a simple example of the 'base' plot I'd like to transform: >> >> data(Arthritis, package="vcdExtra") >> Arthritis$Better <- as.numeric(Arthritis$Improved > "None") >> arth.logistic <- glm(Better ~ Age, data=Arthritis, family=binomial) >> >> # get fitted values on the logit scale >> pred <- data.frame(Arthritis, >>predict(arth.logistic, se.fit=TRUE)) >> library(ggplot2) >> library(scales) >> # plot on logit scale >> gg <- ggplot(pred, aes(x=Age, y=fit)) + >> geom_line(size = 2) + theme_bw() + >> geom_ribbon(aes(ymin = fit - 1.96 * se.fit, >> ymax = fit + 1.96 * se.fit,), alpha = 0.2, color = >> "transparent") + >> labs(x = "Age", y = "Log odds (Better)") >> gg >> >> Things I've tried that don't work: >> >> > gg + coord_trans(ytrans="logis") >> Error in get(as.character(FUN), mode = "function", envir = envir) : >> object 'logis_trans' of mode 'function' was not found >> > >> > gg + coord_trans(ytrans=probability_trans("logis")) >> Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed >> In addition: Warning message: >> In qfun(x, ...) : NaNs produced >> > >> >> Doing what I want "manually": >> >> # doing it manually >> pred2 <- within(pred, { >> prob <- plogis(fit) >> lower <- plogis(fit - 1.96 * se.fit) >> upper <- plogis(fit + 1.96 * se.fit) >> }) >> >> >> gg2 <- ggplot(pred2, aes(x=Age, y=prob)) + >> geom_line(size = 2) + theme_bw() + >> geom_ribbon(aes(ymin = lower, >> ymax = upper), alpha = 0.2, color = "transparent") + >> labs(x = "Age", y = "Probability (Better)") >> gg2 >> >> >> >> -- >> Michael Friendly Email: friendly AT yorku DOT ca >> Professor, Psychology Dept. & Chair, Quantitative Methods >> York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 >> 4700 Keele StreetWeb: http://www.datavis.ca >> Toronto, ONT M3J 1P3 CANADA >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Tim Marcella > 508.498.2989 > timmarce...@gmail.com > -- Tim Marcella 508.498.2989 timmarce...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting list names within a looped function
No this doesn't seem like what I want. As you can see in the dataset sometimes there are the same species on different locations. First I want to make a chart on how many of a single species there are in this dataset and preferrably for all species in one chart. Second I want to make such a chart per Location or Station as called in the data set and third such a chart per Sa.Code Gr, Marcella 2012/11/15 Berend Hasselman > > On 15-11-2012, at 11:14, Gaj Stan (BIGCAT) wrote: > > > Hello all, > > > > I have the following problem: > > > > 1) A list was defined as 'a' > > > > a <- list("var1"=c(100:1), "var2"=c(1:100), "var3"=rnorm(100)) > > > > 2) a function 'foo' was defined that extracts the variable name assigned > to x using the deparse(substitute()) functionality. This name will then be > used within the function to generate specific output files, etc. > > > > foo <- function(x) { > > print( deparse(substitute(x)) ) > > } > > > > However, I am currently interested in looping through all list variables > and extract the list variable name from within the function. The current > loop (see below) will result in > > > > for(i in 1:length(a)) { > > foo(a[[i]]) > > } > > [1] "a[[i]]" > > > > which actually does what I expected of deparse(substitute(x)), but is > not what I wanted. I would like to end up with something like > > > > [1] "var1" > > [1] "var2" > > [1] "var3" > > > > or > > > > [1] "a[[\"var1\"]]" > > [1] "a[[\"var2\"]]" > > [1] "a[[\"var3\"]]" > > > > Keep in mind that x has to be a matrix, and not a list. This to keep the > function as general as possible. > > > > Does anyone have any idea on how to tackle this? Is > deparse(substitute(x)) here the best way to go? Are there alternatives? > > I'm not sure if I understand what you want but will this give you what you > seem to want: > > names(a) > > Berend > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating RMSE in R from hurdle regression object
Hi, My data is characterized by many zeros (82%) and overdispersion. I have chosen to model with hurdle regression (pscl package) with a negative binomial distribution for the count data. In an effort to validate the model I would like to calculate the RMSE of the predicted vs. the observed values. From my reading I understand that this is the calculated on the raw residuals generated from the model output. This is the formula I used H1.RMSE <- sqrt(mean(H1$residuals^2)) # Where H1 is my fitted hurdle model I get 46.7 as the RMSE. This seems high to me based on the model results. Assuming my formula and my understanding of RMSE is correct (and please correct me if I am wrong) I question whether this is an appropriate use of validation for this particular structure of model. The hurdle model correctly predicts all of my zeros. The predictions I get from the fitted model are all values greater than zero. From my readings I understand that the predictions from the fitted hurdle model are means generated for the particular covariate environment based on the model coefficients. If this is truly the case it does not make sense to compare these means to the observations. This will generate large residuals (only 18% of the observations contain counts greater than 0, while the predicted counts all exceed 0). It seems like comparing apples to oranges. Other correlative tests (Pearson's r, Spearman's p) would seem to be comparing the mean predicted value for particular covariate to the observed which again is heavily dominated by zeros. Any tips on how best to validate hurdle models in R? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Negative binomial models and censored observations
Hi, I am working with hurdle models in the pscl package to model zero inflated overdispersed count data and want to incorporate censored observations into the equation. 33% of the observed positive count data is right censored, i.e. subject lost to follow up during the duration of the study. Can this be accounted for in the hurdle() function? Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coding for segmented regression within a hurdle model
Hi, I am using a two part hurdle model to account for zero inflation and overdispersion in my count data. I would like to account for a segmented or breakpoint relationship in the binomial logistic hurdle model and pass these results onto the count model (negative binomial). Using the segemented package I have determined that my data supports one breakpoint at 3.85. The slope to this point is significant and will affect the presence of zeros in a linear fashion. The slope > 3.85 is non-significant and estimated to not help predict the presence of zeros in the data (threshold effect). Here are the results from this model Estimated Break-Point(s): Est. St.Err 3.853 1.372 t value for the gap-variable(s) V: 0 Meaningful coefficients of the linear terms: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.2750 0.3556 -0.774 0.4392 approach_km -0.4520 0.2184 -2.069 0.0385 * sea2 0.3627 0.2280 1.591 0.1117 U1.approach_km 0.4543 0.2188 2.076 NA U1.approach_km is the estimate for the second slope. The actual estimated slope for the section section is the difference between this value and the approach_km value (0.0023). I think that I have found a way to "maually" code this into the hurdle model as follows hurdle.fit <- hurdle(tot_f ~ x1 + x2 + x3 | approach_km + I(pmax(approach_km-3.849,0)) + sea ) When I look at the estimated coefficients from the "manual" code it gives the same values. However, the std.errors are estimated lower. Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -0.274410.29347 -0.9350.350 approach_km -0.452610.09993 -4.529 5.92e-06 *** I(pmax(approach_km - 3.849, 0)) 0.454860.10723 4.242 2.22e-05 *** sea2 0.362710.22803 1.5910.112 Question # 1: Does the hurdle equation use the standard errors from the zero model when building the count predictions? If no then I guess I would not have to worry about this and can just report the original std.errors and associated p values from the segemented object in the pub. Question # 2: If the count model uses the std.errors, how can I reformulate this equation to generate the original std.errors. Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.