[R] Extract values from a predict() result... how?
Hello, silly question I suppose, but somehow I can't manage to extract the probabilities from a glm.predict() result: > str(res) Named num [1:9] 0.00814 0.01877 0.025 0.02941 0.03563 ... - attr(*, "names")= chr [1:9] "1" "2" "3" "4" ... I got from: # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) model <- glm(lot1 ~ log(u), data=clotting, family=Gamma) res <- predict(model, clotting) I want to transfer the probabilities "0.00814 0.01877 0.025 0.02941 0.03563 ..." to a separate vector, how do I do this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking the assumptions for a proper GLM model
Hello, Are there any packages/functions available for testing the assumptions underlying assumptions for a good GLM model? Like linktest in STATA and smilar. If not, could somebody please describe their work process when they check the validity of a logit/probit model? Regards, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking the assumptions for a proper GLM model
So what I'm looking for is readily available tools/packages that could produce some of the following: 3.6 Summary of Useful Commands (STATA: Source: http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalog3.htm) * linktest--performs a link test for model specification, in our case to check if logit is the right link function to use. This command is issued after the logit or logistic command. * lfit--performs goodness-of-fit test, calculates either Pearson chi-square goodness-of-fit statistic or Hosmer-Lemeshow chi-square goodness-of-fit depending on if the group option is used. * fitstat -- is a post-estimation command that computes a variety of measures of fit. * lsens -- graphs sensitivity and specificity versus probability cutoff. * lstat -- displays summary statistics, including the classification table, sensitivity, and specificity. * lroc -- graphs and calculates the area under the ROC curve based on the model. * listcoef--lists the estimated coefficients for a variety of regression models, including logistic regression. * predict dbeta -- Pregibon delta beta influence statistic * predict deviance -- deviance residual * predict dx2 -- Hosmer and Lemeshow change in chi-square influence statistic * predict dd -- Hosmer and Lemeshow change in deviance statistic * predict hat -- Pregibon leverage * predict residual -- Pearson residuals; adjusted for the covariate pattern * predict rstandard -- standardized Pearson residuals; adjusted for the covariate pattern * ldfbeta -- influence of each individual observation on the coefficient estimate ( not adjusted for the covariate pattern) * graph with [weight=some_variable] option * scatlog--produces scatter plot for logistic regression. * boxtid--performs power transformation of independent variables and performs nonlinearity test. But, since I'm new to GLM, I owuld greatly appreciate how you/others go about and test the validity of a GLM model. On Feb 18, 1:18 am, Jay wrote: > Hello, > > Are there any packages/functions available for testing the assumptions > underlying assumptions for a good GLM model? Like linktest in STATA > and smilar. If not, could somebody please describe their work process > when they check the validity of a logit/probit model? > > Regards, > Jay > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking the assumptions for a proper GLM model
Well, yes and no. Obviously I was not asking for the complete recap of a all the theory on the subject. My main concern is finding readily available CRAN functions and packages that would help me in the process. I've found the UCLA site to be very informative and spent a lot of time ther the last couple of days. However, their section on using R for validating the assumptions is very lacking. Naturally links like google.com and amazon.com will eventually get me there, but if somebody have other recommendations, I would be very fortunate to get even more help. BR, Jay On Feb 18, 7:01 pm, David Winsemius wrote: > At one time the "answer" would have been to buy a copy of Venables and > Ripley's "Modern Applied Statistics with S" (and R), and that would > still be a sensible strategy. There are now quite a few other R- > centric texts that have been published in the last few years. Search > Amazon if needed. You seem to be asking for a tutorial on general > linear modeling (which if you read the Posting Guide you will find is > not a service offered by the r-help list.) Perhaps you should have > edited the link you provided in the obvious fashion: > > http://www.ats.ucla.edu/stat/R/ > > Perhaps one of these pages:http://www.ats.ucla.edu/stat/R/dae/default.htm > > The UCLA Statistics website used to be dismissive of R, but they more > recently appear to have seen the light. There is also a great amount > of contributed teaching material on CRAN: > > http://cran.r-project.org/other-docs.html > > ... and more would be readily available via Googling with "r-project" > as part of a search strategy. Frank Harrell's material is in > particular quite useful: > > http://biostat.mc.vanderbilt.edu/wiki/Main/StatComp > > -- > David. > > On Feb 18, 2010, at 8:32 AM, Jay wrote: > > > > > > > So what I'm looking for is readily available tools/packages that could > > produce some of the following: > > > 3.6 Summary of Useful Commands (STATA: Source: > >http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalo...) > > > * linktest--performs a link test for model specification, in our > > case to check if logit is the right link function to use. This command > > is issued after the logit or logistic command. > > > and performs nonlinearity test. > > > But, since I'm new to GLM, I owuld greatly appreciate how you/others > > go about and test the validity of a GLM model. > > > On Feb 18, 1:18 am, Jay wrote: > >> Hello, > > >> Are there any packages/functions available for testing the > >> assumptions > >> underlying assumptions for a good GLM model? Like linktest in STATA > >> and smilar. If not, could somebody please describe their work process > >> when they check the validity of a logit/probit model? > > >> Regards, > >> Jay > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract information from S4 object
The function prediction() returns this: Formal class 'performance' [package "ROCR"] with 6 slots ..@ x.name : chr "Cutoff" ..@ y.name : chr "Accuracy" ..@ alpha.name : chr "none" ..@ x.values:List of 1 .. ..$ : Named num [1:89933] Inf 2.23 2.22 2.17 2.16 ... .. .. ..- attr(*, "names")= chr [1:89933] "" "36477" "56800" "41667" ... ..@ y.values:List of 1 .. ..$ : num [1:89933] 0.5 0.5 0.5 0.5 0.5 ... ..@ alpha.values: list() Now, since I want to match each prediction with its original case, I need to extract the names, i.e. the information in "- attr(*, "names")= chr [1:89933] "" "36477" "56800" "41667" ..." so I can use it with a simple datafile[names,] query. How do I get these names in plain number formats? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract information from S4 object
Tahnk you. But, when I try the command you both suggested I get a NULL as the results. > names(object1 @ x.values) NULL Where did I go wrong? On Feb 22, 4:34 pm, David Winsemius wrote: > On Feb 22, 2010, at 8:05 AM, Jay wrote: > > > > > The function prediction() returns this: > > > Formal class 'performance' [package "ROCR"] with 6 slots > > ..@ x.name : chr "Cutoff" > > ..@ y.name : chr "Accuracy" > > ..@ alpha.name : chr "none" > > ..@ x.values :List of 1 > > .. ..$ : Named num [1:89933] Inf 2.23 2.22 2.17 2.16 ... > > .. .. ..- attr(*, "names")= chr [1:89933] "" "36477" "56800" > > "41667" ... > > ..@ y.values :List of 1 > > .. ..$ : num [1:89933] 0.5 0.5 0.5 0.5 0.5 ... > > ..@ alpha.values: list() > > > Now, since I want to match each prediction with its original case, I > > need to extract the names, i.e. the information in "- attr(*, > > "names")= chr [1:89933] "" "36477" "56800" "41667" ..." so I can use > > it with a simple datafile[names,] query. > > > How do I get these names in plain number formats? > > Not sure what you mean by "plain number formats" but this should get > you a vector of "names" assuming the prediction object is named > "predobject": > > names( predobj...@x.values ) > > If you wanted them "as.numeric", then that is the name of the > appropriate function. > > -- > David > > > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cross-validation for parameter selection (glm/logit)
If my aim is to select a good subset of parameters for my final logit model built using glm(). What is the best way to cross-validate the results so that they are reliable? Let's say that I have a large dataset of 1000's of observations. I split this data into two groups, one that I use for training and another for validation. First I use the training set to build a model, and the the stepAIC() with a Forward-Backward search. BUT, if I base my parameter selection purely on this result, I suppose it will be somewhat skewed due to the 1-time data split (I use only 1 training dataset) What is the correct way to perform this variable selection? And are the readily available packages for this? Similarly, when I have my final parameter set, how should I go about and make the final assessment of the models predictability? CV? What package? Thank you in advance, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot ontop a contourplot (package: lattice)
Hello, I have a contourplot plot that shows the data I want. However, I would like to point a certain amount of points from this plot via a xyplot(). Example: x <- seq(pi/4, 5 * pi, length.out = 100) y <- seq(pi/4, 5 * pi, length.out = 100) r <- as.vector(sqrt(outer(x^2, y^2, "+"))) grid <- expand.grid(x=x, y=y) grid$z <- cos(r^2) * exp(-r/(pi^3)) levelplot(z~x*y, grid, cuts = 50, panel.xyplot(x~y)) But the point does not show up. What is the correct way to achieve this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Check for overdispersion in logit model
A quick question for those that are familiar with the subject, is it OK to check for overdispersion in a logit model using: sum(resid(model, type = "pearson")^2) / df.residual(model) Are tehre other commands? packages? /Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Positioning plots on top of each other (aligment & borders)
Hello, I want to place two plots on top of each other. However, the problem is that I can't figure out a simple way to align them correctly. Is there a way to specify this? Since the data is bunch of coordinates and the second layer is an outline of a map (a .ps file I import using the grImport package), I suppose one option would be to specify a set of "artificial" coordinates that make up the very corners of that plot, and then have the second layer will this same space. Any ideas how to do this? //John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot: several plots in one creates y-scale problem
Hello, I've been looking for a solution to this problem for some time now but I seem unable to solve it.. so this is the case: I want to plot 4 time series in the same graph using xyplot(). When I do this with xyplot(mydata[,2]+mydata[,3]+mydata[,4]+mydata[,5] ~ mydata[,1], data = mydata, type = "l", auto.key = list(space="right", lines = T, points = F), par.settings = simpleTheme(lty = c(1,2,3,4)) ) I get a graph where all lines are "maximized" to cover the entire y- scale width. I.e., they are use their own scale independent of each other (my data has some columns that are one magnitude smaller than the others). How do I force them all to use the same y-scale? I found this thread: http://n4.nabble.com/superimposing-xyplots-on-same-scale-td905525.html, but I'm not really sure what is going on there. Any ideas? /J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot: problems with column names & legend
Hello! one more question about xyplot. If I have data which have space in the column names, say "xyz 123". How do I create a working graph where this text is displayed in the legend key? Now when I try something like xyplot("xyz 123" ~ variable1, data = mydata, ...) I get nothing. Also, is it possible to genrate the graph with xyplot(mydata[,1] ~ variable1, data = mydata, ...) and then later in the code specify the names that should be displayed in the legend? Thank you! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot: problems with column names & legend
Thanks, the backtickes got the code working. However, now I cant get it to draw the legend/key. For example, look at this figure: http://osiris.sunderland.ac.uk/~cs0her/Statistics/xyplot5.png My graph is similar, but instead of 1,2,...,8 as the names of the series I want it to say "Data one" (a string with spaces) and so on. On Jan 3, 10:58 am, baptiste auguie wrote: > Hi, > > Using backticks might work to some extent, > > library(lattice) > `my variable` = 1:10 > y=rnorm(10) > xyplot(`my variable` ~ y) > > but if your data is in a data.frame the names should have been converted, > > make.names('my variable') > [1] "my.variable" > > HTH, > > baptiste > > 2010/1/3 Jay : > > > > > Hello! > > > one more question about xyplot. If I have data which have space in the > > column names, say "xyz 123". How do I create a working graph where > > this text is displayed in the legend key? > > > Now when I try something like xyplot("xyz 123" ~ variable1, data = > > mydata, ...) I get nothing. > > Also, is it possible to genrate the graph with xyplot(mydata[,1] ~ > > variable1, data = mydata, ...) and then later in the code specify > > the names that should be displayed in the legend? > > > Thank you! > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot: problems with column names & legend
Anybody? Frustrating to be unable to solve this silly little problem... On Jan 3, 12:48 pm, Jay wrote: > Thanks, the backtickes got the code working. However, now I cant get > it to draw the legend/key. > For example, look at this > figure:http://osiris.sunderland.ac.uk/~cs0her/Statistics/xyplot5.png > My graph is similar, but instead of 1,2,...,8 as the names of the > series I want it to say "Data one" (a string with spaces) and so on. > > On Jan 3, 10:58 am, baptiste auguie > wrote: > > > > > Hi, > > > Using backticks might work to some extent, > > > library(lattice) > > `my variable` = 1:10 > > y=rnorm(10) > > xyplot(`my variable` ~ y) > > > but if your data is in a data.frame the names should have been converted, > > > make.names('my variable') > > [1] "my.variable" > > > HTH, > > > baptiste > > > 2010/1/3 Jay : > > > > Hello! > > > > one more question about xyplot. If I have data which have space in the > > > column names, say "xyz 123". How do I create a working graph where > > > this text is displayed in the legend key? > > > > Now when I try something like xyplot("xyz 123" ~ variable1, data = > > > mydata, ...) I get nothing. > > > Also, is it possible to genrate the graph with xyplot(mydata[,1] ~ > > > variable1, data = mydata, ...) and then later in the code specify > > > the names that should be displayed in the legend? > > > > Thank you! > > > > __ > > > r-h...@r-project.org mailing list > > >https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting > > > guidehttp://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > __ > > r-h...@r-project.org mailing > > listhttps://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot: adjusting the scale (min, max & tick)
Hi, I'm terribly sorry but it seems it cannot figure this one out by myself so, please, if somebody could help I would be very grateful. So, when I plot with xyplot() I get an y-axis that is very ugly... starting from a random number and having so many ticks that it becomes unreadable. How do I tell xyplot how to draw the axis? E.g., start from 100, end at 200 with 25 units between ticks/labels? Can somebody give me an example? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot: adjusting the scale (min, max & tick)
Perfect, that piece of code did exactly what I wanted. However, I stumpled upon a new problem, now my data is plotted on a totally wrong scale. The y-values are all between 160k and 500k, BUT now with that option I find that the plots are between 0 and 50 (?!?). What did I do wrong? This plots the data OK, even thoug it should be between 160k and 500k on the y-scale: xyplot(data1[,2]+data1[,3]~data1[,1], data = data1, type = "l", xlab = "x", ylab = "y", auto.key = list(space="top", lines = T, points = F, text=c("text1", text2")), par.settings = simpleTheme(lty = c(1,2)), scales=list( x=list(alternating=FALSE,tick.number = 11), y=list(limits=c(0,50)) ) ) If I remove the " y=list(limits=c(0,50))" the dat is plotted as it should. Peter Ehlers wrote: > > Have a look at the 'scales' argument. For example: > > # default plot > xyplot(Sepal.Length ~ Petal.Length | Species, data = iris) > > # modified plot > xyplot(Sepal.Length ~ Petal.Length | Species, data = iris, > scales=list(y=list(at=c(-5,0,5,10), limits=c(-5,10 > > -Peter Ehlers > > Jay wrote: >> Hi, >> >> I'm terribly sorry but it seems it cannot figure this one out by >> myself so, please, if somebody could help I would be very grateful. >> So, when I plot with xyplot() I get an y-axis that is very ugly... >> starting from a random number and having so many ticks that it becomes >> unreadable. >> >> How do I tell xyplot how to draw the axis? E.g., start from 100, end >> at 200 with 25 units between ticks/labels? >> Can somebody give me an example? >> >> Thanks! >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > Peter Ehlers > University of Calgary > 403.202.3921 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/xyplot-adjusting-the-scale-min-max-tick-tp999611p1008539.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Results from clogit out of range?
I'm not positive of the question you are asking b/c I lost some of initial messages in thread but I think >predict(model, type="expected") gives fitted probabilities Apologies if I answered a question no one asked. On Feb 28, 2013, at 7:45 PM, lisa wrote: > I do appreciate this answer. I heard that in SAS, conditional logistic model > do predictions in the same way. However, this formula can only deal with > in-sample predictions. How about the out-of-sample one? Is it like one of the > former responses by Thomas, say, it's impossible to do the out-of-sample > prediction?? > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nnclust: nnfind() distance metric?
Hello, pardon my ingorance, but what distance metric is used in this function in the nnclust package? The manual only says: "Find the nearest neighbours of points in one data set from another data set. Useful for Mallows-type distance metrics." BR, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice: location of key inside a xyplot()
Hello, adding the line: space = 'inside' to my xyplot() call provides a partial solution to the problem I have.. However, this command puts the key in the left upper corner when I want in the left upper corner. Looking at the help provides no additional options for the "inside", but surely this must be possible somehow? BR, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC: owerwrite into a named range in Excel
Hello, Let's say that I have a data frame of n numbers I want to transfer into a Excel spreadsheet. I have opened the conection to the file using ODBC, and I can query the content of these n cells without problem. However, how do I transfer my new values to these cells? I.e., overwite them. Should I use sqlSave() or sqlUpdate()? Using the update I get the error: "cannot update ‘data_001’ without unique column" sqlUpdate(connection_name, my_new_data_frame, "name_of_the_range_in_excel") Let's say that the range in Excel is in E10:E20. (if it matters) BR, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Shared nearest neighbor (SNN) clustering algorithm implementation?
Hello, is there an implementation available for a shared nearest neighbor (SNN) clustering algorithm? //Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fuzzy Discriminant Analysis (DA)
Hello all, as the combination DA and R is rather new to me I would like to know: are there packages that implement a fuzzy version of Discriminant Analysis? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] kohonen: "Argument data should be numeric"
Hi, I'm trying to utilize the kohonen package to build SOM's. However, trying this on my data I get the error: "Argument data should be numeric" when running the som(data.train, grid = somgrid(6, 6, "hexagonal")) function. As you see, there is a problem with the data type of data.train which is a list. When I try to convert it to "numeric" I get the error: (list) object cannot be coerced to type 'double' What should I do? I can convert the data.train if I take only one column of the list: data.train[[1]], but that is naturally not what I want. How did I end up with this data format? What I did: data1 <- read.csv("data1.txt", sep = ";") training <- sample(nrow(data1), 1000) data.train <- data1[training,2:20] I tried to use scan as the import method (read about this somewhere) and unlist, but I'm not really sure how I should get it to numeric/ working. Thanks, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: "ndensity" and "density" parameters
Hello, if I want to compare the distributions of two datasets using ggplots, how should I choose the density type? More exactly, what assumptions and are behind the "ndensity" and "density" parameters? And when should they be used? See http://had.co.nz/ggplot2/stat_bin.html While I understand that one is scaled and the other one is not, I do not understand which one I should rely on. The distributions look very different when I try both alternatives. Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting columns with specific string in their names
Hi, Let's say that I have a set of column names that begin with the string "Xyz". How do I extract these specific columns? I tried to do the following: dataframe1[,grep("Xyz",colnames(dataframe1))] But it does not work. What is wrong with my expression? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting columns with specific string in their names
Sorry, my mistake. The thing is that the command return no results at all. However, when I just tried a simpler version of this (I had no capital letters or no spaces in the string), it worked fine. I cant figure it out, I think it all boils down to the fact that I'm no expert at regexp's... On Aug 22, 5:53 pm, "R. Michael Weylandt" wrote: > Can you say a little more about what you mean "it does not work"? I'd guess > you have a regular expression mistake and are probably getting more columns > than desired, but without an example, it's hard to be certain. > > Use dput() and head() to give a small cut-and-paste-able example. > > Michael > > > > > > > > > > On Mon, Aug 22, 2011 at 10:33 AM, Jay wrote: > > Hi, > > > Let's say that I have a set of column names that begin with the string > > "Xyz". How do I extract these specific columns? I tried to do the > > following: > > > dataframe1[,grep("Xyz",colnames(dataframe1))] > > > But it does not work. What is wrong with my expression? > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart: plot without scientific notation
While I'm very pleased with the results I get with rpart and rpart.plot, I would like to change the scientific notation of the dependent variable in the plots into integers. Right now all my 5 or more digit numbers are displayed using scientific notation. I managed to find this: http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8423.html but I do not fully understand what to change, and to what. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Decision tree with the group median as response?
As I am only familiar with the basics regarding decision trees I would like to ask, with the risk of sating a silly question: is it possible to perform recursive partitioning with the group median as the response/objective? For example, in stead of rpart focusing on means, could a similar tree be created with medians? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rpart: apply tree to new data to get "counts"
Hi, when I have made a decision tree with rpart, is it possible to "apply" this tree to a new set of data in order to find out the distribution of observations? Ideally I would like to plot my original tree, with the counts (at each node) of the new data. Reagards, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rpart: apply tree to new data to get "counts"
I tried that, while I find the documentation a bit short, but the only result I get from this is a probability distribution of my data (I'm building a tree with 2 classes). How do I plot a tree where the counts are show in each step/node? BR, Jay On Aug 29, 9:40 pm, Weidong Gu wrote: > ? predict.rpart > > Weidong Gu > > > > On Mon, Aug 29, 2011 at 12:49 PM, Jay wrote: > > Hi, > > > when I have made a decision tree with rpart, is it possible to "apply" > > this tree to a new set of data in order to find out the distribution > > of observations? Ideally I would like to plot my original tree, with > > the counts (at each node) of the new data. > > > Reagards, > > Jay > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] On-line machine learning packages?
What R packages are available for performing classification tasks? That is, when the predictor has done its job on the dataset (based on the training set and a range of variables), feedback about the true label will be available and this information should be integrated for the next classification round. //Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] On-line machine learning packages?
Hi, I used the rseek search engine to look for suitable solutions, however as I was unable to find anything useful, I'm asking for help. Anybody have experience with these kinds of problems? I looked into dynaTree, but as information is a bit scares and as I understand it, it might not be what I'm looking for..(?) BR, Jay On Sep 11, 7:15 pm, David Winsemius wrote: > On Sep 11, 2011, at 11:42 AM, Jay wrote: > > > What R packages are available for performing classification tasks? > > That is, when the predictor has done its job on the dataset (based on > > the training set and a range of variables), feedback about the true > > label will be available and this information should be integrated for > > the next classification round. > > You should look at CRAN Task Views. Extremely easy to find from the > main R-project page. > > -- > David Winsemius, MD > West Hartford, CT > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] On-line machine learning packages?
If the answer is so obvious, could somebody please spell it out? On Sep 11, 10:59 pm, Jason Edgecombe wrote: > Try this: > > http://cran.r-project.org/web/views/MachineLearning.html > > On 09/11/2011 12:43 PM, Jay wrote: > > > > > Hi, > > > I used the rseek search engine to look for suitable solutions, however > > as I was unable to find anything useful, I'm asking for help. > > Anybody have experience with these kinds of problems? I looked into > > dynaTree, but as information is a bit scares and as I understand it, > > it might not be what I'm looking for..(?) > > > BR, > > Jay > > > On Sep 11, 7:15 pm, David Winsemius wrote: > >> On Sep 11, 2011, at 11:42 AM, Jay wrote: > > >>> What R packages are available for performing classification tasks? > >>> That is, when the predictor has done its job on the dataset (based on > >>> the training set and a range of variables), feedback about the true > >>> label will be available and this information should be integrated for > >>> the next classification round. > >> You should look at CRAN Task Views. Extremely easy to find from the > >> main R-project page. > > >> -- > >> David Winsemius, MD > >> West Hartford, CT > > >> __ > >> r-h...@r-project.org mailing > >> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] On-line machine learning packages?
In my mind this sequential classification task with feedback is somewhat different from an completely offline, once-off, classification. Am I wrong? However, it looks like the mentality on this topic is to refer me to cran/google in order to look for solutions myself. Oblivious I know about these sources, and as I said, I used rseek.org among other sources to look for solutions. I did not start this topic for fun, I'm asking for help to find a suitable machine learning packages that readily incorporates feedback loops and online learning. If somebody has experience these kinds of problems in R, please respond. Or will "http://cran.r-project.org Look for 'Task Views'" be my next piece of advice? On Sep 12, 11:31 am, Dennis Murphy wrote: > http://cran.r-project.org/web/views/ > > Look for 'machine learning'. > > Dennis > > > > On Sun, Sep 11, 2011 at 11:33 PM, Jay wrote: > > If the answer is so obvious, could somebody please spell it out? > > > On Sep 11, 10:59 pm, Jason Edgecombe wrote: > >> Try this: > > >>http://cran.r-project.org/web/views/MachineLearning.html > > >> On 09/11/2011 12:43 PM, Jay wrote: > > >> > Hi, > > >> > I used the rseek search engine to look for suitable solutions, however > >> > as I was unable to find anything useful, I'm asking for help. > >> > Anybody have experience with these kinds of problems? I looked into > >> > dynaTree, but as information is a bit scares and as I understand it, > >> > it might not be what I'm looking for..(?) > > >> > BR, > >> > Jay > > >> > On Sep 11, 7:15 pm, David Winsemius wrote: > >> >> On Sep 11, 2011, at 11:42 AM, Jay wrote: > > >> >>> What R packages are available for performing classification tasks? > >> >>> That is, when the predictor has done its job on the dataset (based on > >> >>> the training set and a range of variables), feedback about the true > >> >>> label will be available and this information should be integrated for > >> >>> the next classification round. > >> >> You should look at CRAN Task Views. Extremely easy to find from the > >> >> main R-project page. > > >> >> -- > >> >> David Winsemius, MD > >> >> West Hartford, CT > > >> >> __ > >> >> r-h...@r-project.org mailing > >> >> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting > >> >> guidehttp://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> > __ > >> > r-h...@r-project.org mailing list > >> >https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting > >> > guidehttp://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > >> __ > >> r-h...@r-project.org mailing > >> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] On-line machine learning packages?
How does sequential classification differ form running a one-off classifier for each run? -> Because feedback from the previous round can and needs to be incorporated into the ext round. http://lmgtfy.com/?q=R+machine+learning -> That is a new low. I was hoping to get help, oblivious I was wrong to use this forum in the hopes of somebody had already battled these kinds of problems in R. On Sep 13, 1:52 am, Jason Edgecombe wrote: > I already provided the link to the task view, which provides a list of > the more popular machine learning algorithms for R. > > Do you have a particular algorithm or technique in mind? Does it have a > name? > > How does sequential classification differ form running a one-off > classifier for each run? > > On 09/12/2011 05:24 AM, Jay wrote: > > > > > In my mind this sequential classification task with feedback is > > somewhat different from an completely offline, once-off, > > classification. Am I wrong? > > However, it looks like the mentality on this topic is to refer me to > > cran/google in order to look for solutions myself. Oblivious I know > > about these sources, and as I said, I used rseek.org among other > > sources to look for solutions. I did not start this topic for fun, I'm > > asking for help to find a suitable machine learning packages that > > readily incorporates feedback loops and online learning. If somebody > > has experience these kinds of problems in R, please respond. > > > Or will > > "http://cran.r-project.org > > Look for 'Task Views'" > > be my next piece of advice? > > > On Sep 12, 11:31 am, Dennis Murphy wrote: > >>http://cran.r-project.org/web/views/ > > >> Look for 'machine learning'. > > >> Dennis > > >> On Sun, Sep 11, 2011 at 11:33 PM, Jay wrote: > >>> If the answer is so obvious, could somebody please spell it out? > >>> On Sep 11, 10:59 pm, Jason Edgecombe wrote: > >>>> Try this: > >>>>http://cran.r-project.org/web/views/MachineLearning.html > >>>> On 09/11/2011 12:43 PM, Jay wrote: > >>>>> Hi, > >>>>> I used the rseek search engine to look for suitable solutions, however > >>>>> as I was unable to find anything useful, I'm asking for help. > >>>>> Anybody have experience with these kinds of problems? I looked into > >>>>> dynaTree, but as information is a bit scares and as I understand it, > >>>>> it might not be what I'm looking for..(?) > >>>>> BR, > >>>>> Jay > >>>>> On Sep 11, 7:15 pm, David Winsemius wrote: > >>>>>> On Sep 11, 2011, at 11:42 AM, Jay wrote: > >>>>>>> What R packages are available for performing classification tasks? > >>>>>>> That is, when the predictor has done its job on the dataset (based on > >>>>>>> the training set and a range of variables), feedback about the true > >>>>>>> label will be available and this information should be integrated for > >>>>>>> the next classification round. > >>>>>> You should look at CRAN Task Views. Extremely easy to find from the > >>>>>> main R-project page. > >>>>>> -- > >>>>>> David Winsemius, MD > >>>>>> West Hartford, CT > >>>>>> __ > >>>>>> r-h...@r-project.org mailing > >>>>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting > >>>>>> guidehttp://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>> __ > >>>>> r-h...@r-project.org mailing list > >>>>>https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting > >>>>> guidehttp://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > >>>> __ > >>>> r-h...@r-project.org mailing > >>>> listhttps://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting > >>>> guidehttp://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible cod
[R] Factor analysis on ordinal & nominal data
Hi, are there readily available R packages that are able to perform FA on ordinal and/or nominal data? If not, what other approaches and helpful packages would you suggest? BR, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count occurances in integers (or strings)
Hi, I have a dataframe column from which I want to calculate the number of 1's in each entry. Some column values could, for example, be "0001001000" and "111". To get the number of occurrences from a string I use this: sum(unlist(strsplit(mydata[,"my_column"], "")) == "1") However, as my data is not in string form.. How do I convert it? l tried: lapply(mydata[,"my_column"],toString) but I do not seem to get it right (or at least I do not understand the output format). Also, are there other options? Can I easily calculate the occurrences directly from the integers? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split data, but ensure each level of the factor is represented
Hello, I'll use part of the iris dataset for an example of what I want to do. > data(iris) > iris<-iris[1:10,1:4] > iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector > a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) > a [1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine > split(iris,a) $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is > a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) > a [1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns > mysplit(iris,a) $`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] split data, but ensure each level of the factor is represented
Thanks so much. On Oct 13, 1:14 pm, "Henrique Dallazuanna" <[EMAIL PROTECTED]> wrote: > Try this: > > a<-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3) > split(iris, a) > > lapply(split(iris, a), dim) > > > > On Mon, Oct 13, 2008 at 2:06 PM, Jay <[EMAIL PROTECTED]> wrote: > > Hello, > > > I'll use part of the iris dataset for an example of what I want to > > do. > > > > data(iris) > > > iris<-iris[1:10,1:4] > > > iris > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 1 5.1 3.5 1.4 0.2 > > 2 4.9 3.0 1.4 0.2 > > 3 4.7 3.2 1.3 0.2 > > 4 4.6 3.1 1.5 0.2 > > 5 5.0 3.6 1.4 0.2 > > 6 5.4 3.9 1.7 0.4 > > 7 4.6 3.4 1.4 0.3 > > 8 5.0 3.4 1.5 0.2 > > 9 4.4 2.9 1.4 0.2 > > 10 4.9 3.1 1.5 0.1 > > > Now if I want to split this data using the vector > > > a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) > > > a > > [1] 3 3 3 2 3 1 2 3 2 3 > > > Then the function split works fine > > > split(iris,a) > > $`1` > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 6 5.4 3.9 1.7 0.4 > > > $`2` > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 4 4.6 3.1 1.5 0.2 > > 7 4.6 3.4 1.4 0.3 > > 9 4.4 2.9 1.4 0.2 > > > $`3` > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 1 5.1 3.5 1.4 0.2 > > 2 4.9 3.0 1.4 0.2 > > 3 4.7 3.2 1.3 0.2 > > 5 5.0 3.6 1.4 0.2 > > 8 5.0 3.4 1.5 0.2 > > 10 4.9 3.1 1.5 0.1 > > > My problem is when the vector lacks one of the values from 1:n. For > > example if the vector is > > > a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) > > > a > > [1] 3 3 3 2 3 2 2 3 2 3 > > > then split will return a list without a $`1`. I would like to have the > > $`1` be a vector of 0's with the same length as the number of columns > > in the dataset. In other words I want to write a function that returns > > > > mysplit(iris,a) > > $`1` > > [1] 0 0 0 0 0 > > > $`2` > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 4 4.6 3.1 1.5 0.2 > > 6 5.4 3.9 1.7 0.4 > > 7 4.6 3.4 1.4 0.3 > > 9 4.4 2.9 1.4 0.2 > > > $`3` > > Sepal.Length Sepal.Width Petal.Length Petal.Width > > 1 5.1 3.5 1.4 0.2 > > 2 4.9 3.0 1.4 0.2 > > 3 4.7 3.2 1.3 0.2 > > 5 5.0 3.6 1.4 0.2 > > 8 5.0 3.4 1.5 0.2 > > 10 4.9 3.1 1.5 0.1 > > > Thank you for your time, > > > Jay > > > __ > > [EMAIL PROTECTED] mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > > [[alternative HTML version deleted]] > > __ > [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities
Dear sir/madame, I am currently writing a meta-analysis on the complication and reoperation rates of 5 different treatment modalities after a distal radius fracture. I was able to pool the rates of the 5 different rates using R. Now I have to compare the pooled rates of the 4 treatment modalities with the golden standard separately. I though the chi squared test would be the best method. How do I do that using r. The R code I have used for the former calculation are added as a Word-file attachment. Your help would be highly appreciated. Yours sincerely, Student __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities
Dear sir/madame, Currently I am writing a meta analysis on complications and reoperations of 5 treatment modalities after an extra-articular distal radius fracture. The treatment modalities are EF, IMN, KW, VPO and PC as the golden standard. We have included 22 studies, 10 RCTs and 12 prospective studies. All examining different treatment methods. We retrieved the data out of these studies and pooled the complication and reoperation rates(n/N). Now we want to compare the pooled proportion of each treatment modality to the golden standard of PC(plaster casting). So I want to do 4 separate comparisons using the chi squared method. I looked it up online and at the meta package guide(meta is the package I used), but wasn't able to find useful information. I first posted my question onto the stats.stackexchange website but was redirected to the mailing list of r-help. I have added a picture of the most important parts of the code (not the egger's regression, funnel, trim and fill and outcome.pdf parts of it because it didn't fit). I have added the data in excel and spss file to my dropbox, and added the complete Rcode to a Word file in my dropbox as well. The links below will refer you to them as preferred by the posting guide. Hopefully someone can help me. Thank you very much. Excel datafile: https://www.dropbox.com/s/19402gt0x1agt9f/Excel%20file%20Distal%20Radius%20Fracture%20basic.xlsx?dl=0 Excel file Distal Radius Fracture basic.xlsx<https://www.dropbox.com/s/19402gt0x1agt9f/Excel%20file%20Distal%20Radius%20Fracture%20basic.xlsx?dl=0> www.dropbox.com Shared with Dropbox SPSS datafile: https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0 [https://cfl.dropboxstatic.com/static/images/icons128/page_white.png]<https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0> Meta-Analyse Complications and Reoperations.sav<https://www.dropbox.com/s/h81pphxkfk74hzo/Meta-Analyse%20Complications%20and%20Reoperations.sav?dl=0> www.dropbox.com Shared with Dropbox Rcode file Word: https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0 [https://cfl.dropboxstatic.com/static/images/icons128/page_white_word.png]<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0> R code voor forrest en funnel plots.rtf<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0> www.dropbox.com Shared with Dropbox https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis [https://cdn.sstatic.net/Sites/stats/img/apple-touch-i...@2.png?v=344f57aa10cc]<https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis> Comparing pooled propotions using R for a meta-analysis<https://stats.stackexchange.com/questions/286920/comparing-pooled-propotions-using-r-for-a-meta-analysis> stats.stackexchange.com For a meta-analysis I have pooled single proportions(complication rates) of several treatment methods. Now I would like to compare them(the 4 treatment modalities separately with the golden standar... Van: David Winsemius Verzonden: vrijdag 23 juni 2017 20:18 Aan: Jay Zola CC: r-help@r-project.org Onderwerp: Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities > On Jun 23, 2017, at 5:53 AM, Jay Zola wrote: > > Dear sir/madame, > > > I am currently writing a meta-analysis on the complication and reoperation > rates of 5 different treatment modalities after a distal radius fracture. I > was able to pool the rates of the 5 different rates using R. Now I have to > compare the pooled rates of the 4 treatment modalities with the golden > standard separately. I though the chi squared test would be the best method. > How do I do that using r. The R code I have used for the former calculation > are added as a Word-file attachment. Not an acceptable format to the listserv program. Policy is set by the host institution. Use plain text. > Your help would be highly appreciated. > > > Yours sincerely, > > > Student > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help thz.ch/mailman/listinfo/r-help> stat.ethz.ch The main R mailing list, for announcements about the development of R and the availability of new code, questions and answers about problems and solutions using R ... > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA
Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities
What is the best way to change my R code to be able to compare the pooled proportions(complication and reoperation rates) with the Chi square method? Verstuurd vanaf mijn iPhone > Op 24 jun. 2017 om 14:18 heeft Michael Dewey het > volgende geschreven: > > Note though that this has been put on hold on stats.stackexchange.com as > off-topic. > >> On 23/06/2017 19:33, Bert Gunter wrote: >> Probably the wrong list. R-help is concerned with R programming, not >> statistics methodology questions, although the intersection can be >> nonempty. >> >> I suggest you post on stats.stackexchange.com instead, which *is* >> concerned with statistics methodology questions. >> >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >>> On Fri, Jun 23, 2017 at 5:53 AM, Jay Zola wrote: >>> Dear sir/madame, >>> >>> >>> I am currently writing a meta-analysis on the complication and reoperation >>> rates of 5 different treatment modalities after a distal radius fracture. I >>> was able to pool the rates of the 5 different rates using R. Now I have to >>> compare the pooled rates of the 4 treatment modalities with the golden >>> standard separately. I though the chi squared test would be the best >>> method. How do I do that using r. The R code I have used for the former >>> calculation are added as a Word-file attachment. Your help would be highly >>> appreciated. >>> >>> >>> Yours sincerely, >>> >>> >>> Student >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> --- >> This email has been checked for viruses by AVG. >> http://www.avg.com >> >> > > -- > Michael > http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing pooled proportions(complication and reoperation rates) of different treatment modalities
Op 26 jun. 2017 om 15:22 heeft Jay Zola mailto:jayjay.1...@hotmail.nl>> het volgende geschreven: What is the best way to change my R code to be able to compare the pooled proportions(complication and reoperation rates) with the Chi square method? Just adding an adjustment to the links because they were not working correctly. Dataset on my dropbox: https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0 R code on my dropbox: https://wwwdropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0 Verstuurd vanaf mijn iPhone Op 24 jun. 2017 om 14:18 heeft Michael Dewey mailto:li...@dewey.myzen.co.uk>> het volgende geschreven: Note though that this has been put on hold on stats.stackexchange.com<http://stats.stackexchange.com> as off-topic. On 23/06/2017 19:33, Bert Gunter wrote: Probably the wrong list. R-help is concerned with R programming, not statistics methodology questions, although the intersection can be nonempty. I suggest you post on stats.stackexchange.com<http://stats.stackexchange.com> instead, which *is* concerned with statistics methodology questions. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jun 23, 2017 at 5:53 AM, Jay Zola mailto:jayjay.1...@hotmail.nl>> wrote: Dear sir/madame, I am currently writing a meta-analysis on the complication and reoperation rates of 5 different treatment modalities after a distal radius fracture. I was able to pool the rates of the 5 different rates using R. Now I have to compare the pooled rates of the 4 treatment modalities with the golden standard separately. I though the chi squared test would be the best method. How do I do that using r. The R code I have used for the former calculation are added as a Word-file attachment. Your help would be highly appreciated. Yours sincerely, Student __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- This email has been checked for viruses by AVG. http://www.avg.com -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Model studies in one analysis using treatment as a five level moderator in a meta-regression
Hello, I am medical student, writing a meta-analysis on complication and reoperation rates after the five most common treatments of distal radius fractures. I have been busy with the statistics for months by my self, but find it quite hard since our classes were very basic. Now I want to compare the treatment modalities to see if there are significant differences. Using R I was able to synthesize the complication rates and reoperation rates for each treatment method. But I never had any R course and managed by trial and error, so the code probably doesn't look that great. Someone told me I could best model the data in one analysis using treatment as a five level moderator in a meta-regression. Can some help me with the R code to do this? Your help would be very much appreciated. Thank you, Jay Study| Event Type| Treatment| Number of Events (n)| N| n/N| Kumaravel| Complications| EF| 3| 23| 0,1304348| Franck| Complications| EF| 2| 20| 0,1| Schonnemann| Complications| EF| 8| 30| 0,267| Aita| Complications| EF| 1| 16| 0,0625| Hove| Complications| EF| 31| 39| 0,7948718| Andersen| Complications| EF| 26| 75| 0,347| Krughaug| Complications| EF| 22| 75| 0,293| Moroni| Complications| EF| 0| 20| 0| Plate| Complications| IMN| 3| 30| 0,1| Chappuis| Complications| IMN| 4| 16| 0,25| Gradl| Complications| IMN| 12| 66| 0,1818182| Schonnemann| Complications| IMN| 6| 31| 0,1935484| Aita| Complications| IMN| 1| 16| 0,0625| Dremstrop| Complications| IMN| 17| 44| 0,3863636| Wong| Complications| PC| 1| 30| 0,033| Kumaravel| Complications| PC| 4| 25| 0,16| Dataset on my dropbox: https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0 Basics excel file complication and reoperation rate.xlsx<https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0> www.dropbox.com Shared with Dropbox library(meta) library(stargazer) library(foreign) All <-read.spss("C:\\Users\\313635aa.STUDENT\\Desktop\\Meta-Analyse Complications and Reoperations.sav",to.data.frame = T, use.value.labels = T) All <- na.omit(All) Complications <- All[which(All[,"Event_Type"] == "Complications"),] Re_operation <- All[which(All[,"Event_Type"] == "Reoperations"),] EF <- All[which(All[,"Treatment"] == "EF"),] IMN <- All[which(All[,"Treatment"] == "IMN"),] pc <- All[which(All[,"Treatment"] == "PC"),] KW <- All[which(All[,"Treatment"] == "KW"),] VPO <- All[which(All[,"Treatment"] == "VPO"),] EF_C <- EF[which(EF[,"Event_Type"] == "Complications"),] EF_R <- EF[which(EF[,"Event_Type"] == "Reoperations"),] IMN_C <- IMN[which(IMN[,"Event_Type"] == "Complications"),] IMN_R <- IMN[which(IMN[,"Event_Type"] == "Reoperations"),] pc_C <- pc[which(pc[,"Event_Type"] == "Complications"),] pc_R <- pc[which(pc[,"Event_Type"] == "Reoperations"),] KW_C <- KW[which(KW[,"Event_Type"] == "Complications"),] KW_R <- KW[which(KW[,"Event_Type"] == "Reoperations"),] VPO_C <- VPO[which(VPO[,"Event_Type"] == "Complications"),] VPO_R <- VPO[which(VPO[,"Event_Type"] == "Reoperations"),] Output <- function(x, y, k.min=10){ file <- metaprop(Events_n, N, Study_ID, data = x) forest.meta(file, studlab = T, pooled.totals = T, bysort = F) dev.copy2pdf(file=y, width = 11.69, height = 8.27) print(file) } R code on my dropbox: https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0 [https://cfl.dropboxstatic.com/static/images/icons128/page_white_word.png]<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0> R code voor forrest en funnel plots.rtf<https://www.dropbox.com/s/67pnfpi10qu110v/R%20code%20voor%20forrest%20en%20funnel%20plots.rtf?dl=0> www.dropbox.com Shared with Dropbox [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Model studies in one analysis using treatment as a five level moderator in a meta-regression
Dear Vito, Thank you for your reply. I tried to contact the statistics departement numerous times, but did not receive any reply. That is why I started to look on the internet for help. Yours sincerely, Jay Verstuurd vanaf mijn iPhone > Op 26 jun. 2017 om 22:05 heeft Vito Michele Rosario Muggeo > het volgende geschreven: > > hi Jay, > Consult a local statistician. Statistics is not you think is (namely simple > computations, R and probably plotting..). > > regards, > vito > > > > Jay Zola ha scritto: > >> Hello, >> >> >> I am medical student, writing a meta-analysis on complication and >> reoperation rates after the five most common treatments of distal radius >> fractures. I have been busy with the statistics for months by my self, but >> find it quite hard since our classes were very basic. Now I want to compare >> the treatment modalities to see if there are significant differences. Using >> R I was able to synthesize the complication rates and reoperation rates for >> each treatment method. But I never had any R course and managed by trial and >> error, so the code probably doesn't look that great. Someone told me I could >> best model the data in one analysis using treatment as a five level >> moderator in a meta-regression. Can some help me with the R code to do this? >> Your help would be very much appreciated. >> >> >> Thank you, >> >> >> Jay >> >> >> Study| Event Type| Treatment| Number of Events (n)| N| n/N| >> >> Kumaravel| Complications| EF| 3| 23| 0,1304348| >> >> Franck| Complications| EF| 2| 20| 0,1| >> >> Schonnemann| Complications| EF| 8| 30| 0,267| >> >> Aita| Complications| EF| 1| 16| 0,0625| >> >> Hove| Complications| EF| 31| 39| 0,7948718| >> >> Andersen| Complications| EF| 26| 75| 0,347| >> >> Krughaug| Complications| EF| 22| 75| 0,293| >> >> Moroni| Complications| EF| 0| 20| 0| >> >> Plate| Complications| IMN| 3| 30| 0,1| >> >> Chappuis| Complications| IMN| 4| 16| 0,25| >> >> Gradl| Complications| IMN| 12| 66| 0,1818182| >> >> Schonnemann| Complications| IMN| 6| 31| 0,1935484| >> >> Aita| Complications| IMN| 1| 16| 0,0625| >> >> Dremstrop| Complications| IMN| 17| 44| 0,3863636| >> >> Wong| Complications| PC| 1| 30| 0,033| >> >> Kumaravel| Complications| PC| 4| 25| 0,16| >> >> >> Dataset on my dropbox: >> https://urlsand.esvalabs.com/?u=https%3A%2F%2Fwww.dropbox.com%2Fs%2Fj1urqzr99bt76ip%2FBasics%2520excel%2520file%2520complication%2520and%2520reoperation%2520rate.xlsx%3Fdl%3D0&e=541e9c83&h=065e9ef9&f=y >> >> Basics excel file complication and reoperation >> rate.xlsx<https://urlsand.esvalabs.com/?u=https%3A%2F%2Fwww.dropbox.com%2Fs%2Fj1urqzr99bt76ip%2FBasics%2520excel%2520file%2520complication%2520and%2520reoperation%2520rate.xlsx%3Fdl%3D0&e=541e9c83&h=065e9ef9&f=y> >> https://urlsand.esvalabs.com/?u=http%3A%2F%2Fwww.dropbox.com&e=541e9c83&h=4bc36151&f=y >> Shared with Dropbox >> >> >> >> >> library(meta) >> library(stargazer) >> library(foreign) >> >> All <-read.spss("C:\\Users\\313635aa.STUDENT\\Desktop\\Meta-Analyse >> Complications and Reoperations.sav",to.data.frame = T, use.value.labels = T) >> All <- na.omit(All) >> >> Complications <- All[which(All[,"Event_Type"] == "Complications"),] >> Re_operation <- All[which(All[,"Event_Type"] == "Reoperations"),] >> >> EF <- All[which(All[,"Treatment"] == "EF"),] >> IMN <- All[which(All[,"Treatment"] == "IMN"),] >> pc <- All[which(All[,"Treatment"] == "PC"),] >> KW <- All[which(All[,"Treatment"] == "KW"),] >> VPO <- All[which(All[,"Treatment"] == "VPO"),] >> >> EF_C <- EF[which(EF[,"Event_Type"] == "Complications"),] >> EF_R <- EF[which(EF[,"Event_Type"] == "Reoperations"),] >> >> IMN_C <- IMN[which(IMN[,"Event_Type"] == "Complications"),] >> IMN_R <- IMN[which(IMN[,"Event_Type"] == "Reoperations"),] >> >> pc_C <- pc[which(pc[,"Event_Type"] == "Complications"),] >> pc_R <- pc[which(pc[,"Event_Type"] == "Reoperations"),] >> >> KW_C <- KW[which(KW[,"Event_Type"] == "Complications
[R] Change Rcode for a meta-analysis(netmeta) to use a random effects model instead of a mixed effects model
Hello, I am writing a meta-analysis on the complication and reoperation rates after 5 treatment modalities of a distal radius fracture. I have a code to compare the complication and reoperation rates. Currently it is using a mixed effects model. Is it possible to change the code so a random effects model is used? Thank you very much, Jay R code library(meta) library(readxl) All <- read_excel("Basics excel file complication and reoperation rate.xlsx", sheet=1) names(All) <- c("Study_ID","Event_Type","Treatment","Events_n","N","nN") All$Treatment <- factor(All$Treatment, levels=c("PC","EF","IMN","KW","VPO")) # Outcomes Complications <- subset(All, Event_Type=="Complications") Reoperations <- subset(All, Event_Type=="Reoperations") # Comparison of treatment effects to gold standard in the Complications subset mtpr1 <- metaprop(Events_n, N, Study_ID, data = Complications) meta::metareg(mtpr1, ~Treatment) # Comparison of treatment effects to gold standard in the Reoperations subset mtpr2 <- metaprop(Events_n, N, Study_ID, data = Reoperations) meta::metareg(mtpr2, ~Treatment) # Comparison of treatment effects to gold standard in the All dataset # Interaction effects have been considered mtpr <- metaprop(Events_n, N, Study_ID, data = All) meta::metareg(mtpr, ~Treatment*Event_Type) A part of the dataset: Study| Event Type| Treatment| Number of Events (n)| N| n/N| Kumaravel| Complications| EF| 3| 23| 0,1304348| Franck| Complications| EF| 2| 20| 0,1| Schonnemann| Complications| EF| 8| 30| 0,267| Aita| Complications| EF| 1| 16| 0,0625| Hove| Complications| EF| 31| 39| 0,7948718| Andersen| Complications| EF| 26| 75| 0,347| Krughaug| Complications| EF| 22| 75| 0,293| Moroni| Complications| EF| 0| 20| 0| Plate| Complications| IMN| 3| 30| 0,1| Chappuis| Complications| IMN| 4| 16| 0,25| Gradl| Complications| IMN| 12| 66| 0,1818182| Schonnemann| Complications| IMN| 6| 31| 0,1935484| Aita| Complications| IMN| 1| 16| 0,0625| Dremstrop| Complications| IMN| 17| 44| 0,3863636| Wong| Complications| PC| 1| 30| 0,033| Kumaravel| Complications| PC| 4| 25| 0,16| [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change Rcode for a meta-analysis(netmeta) to use a random effects model instead of a mixed effects model
Link Dropbox R code: https://www.dropbox.com/s/9u6e89t6dq39r53/Rcode%20metaregression.docx?dl=0 Rcode metaregression.docx<https://www.dropbox.com/s/9u6e89t6dq39r53/Rcode%20metaregression.docx?dl=0> www.dropbox.com Shared with Dropbox Link Dropbox part of dataset: https://www.dropbox.com/s/j1urqzr99bt76ip/Basics%20excel%20file%20complication%20and%20reoperation%20rate.xlsx?dl=0 Van: Viechtbauer Wolfgang (SP) Verzonden: donderdag 29 juni 2017 19:47 Aan: Jay Zola; r-help@r-project.org Onderwerp: RE: Change Rcode for a meta-analysis(netmeta) to use a random effects model instead of a mixed effects model The code in your mail in a mangled mess, since you posted in HTML. Please configure your email client to send emails in plain text. Could you explain what exactly you mean by "Currently it is using a mixed effects model. Is it possible to change the code so a random effects model is used?" Best, Wolfgang >-Original Message- >From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jay Zola >Sent: Thursday, June 29, 2017 19:38 >To: r-help@r-project.org >Subject: [R] Change Rcode for a meta-analysis(netmeta) to use a random >effects model instead of a mixed effects model > >Hello, > >I am writing a meta-analysis on the complication and reoperation rates >after 5 treatment modalities of a distal radius fracture. I have a code to >compare the complication and reoperation rates. Currently it is using a >mixed effects model. Is it possible to change the code so a random effects >model is used? > >Thank you very much, > >Jay > >R code > >library(meta) library(readxl) All <- read_excel("Basics excel file >complication and reoperation rate.xlsx", sheet=1) names(All) <- >c("Study_ID","Event_Type","Treatment","Events_n","N","nN") All$Treatment ><- factor(All$Treatment, levels=c("PC","EF","IMN","KW","VPO")) # Outcomes >Complications <- subset(All, Event_Type=="Complications") Reoperations <- >subset(All, Event_Type=="Reoperations") # Comparison of treatment effects >to gold standard in the Complications subset mtpr1 <- metaprop(Events_n, >N, Study_ID, data = Complications) meta::metareg(mtpr1, ~Treatment) # >Comparison of treatment effects to gold standard in the Reoperations >subset mtpr2 <- metaprop(Events_n, N, Study_ID, data = Reoperations) >meta::metareg(mtpr2, ~Treatment) # Comparison of treatment effects to gold >standard in the All dataset # Interaction effects have been considered >mtpr <- metaprop(Events_n, N, Study_ID, data = All) meta::metareg(mtpr, >~Treatment*Event_Type) > >A part of the dataset: > >Study| Event Type| Treatment| Number of Events (n)| N| n/N| >Kumaravel| Complications| EF| 3| 23| 0,1304348| >Franck| Complications| EF| 2| 20| 0,1| >Schonnemann| Complications| EF| 8| 30| 0,267| >Aita| Complications| EF| 1| 16| 0,0625| >Hove| Complications| EF| 31| 39| 0,7948718| >Andersen| Complications| EF| 26| 75| 0,347| >Krughaug| Complications| EF| 22| 75| 0,293| >Moroni| Complications| EF| 0| 20| 0| >Plate| Complications| IMN| 3| 30| 0,1| >Chappuis| Complications| IMN| 4| 16| 0,25| >Gradl| Complications| IMN| 12| 66| 0,1818182| >Schonnemann| Complications| IMN| 6| 31| 0,1935484| >Aita| Complications| IMN| 1| 16| 0,0625| >Dremstrop| Complications| IMN| 17| 44| 0,3863636| >Wong| Complications| PC| 1| 30| 0,033| >Kumaravel| Complications| PC| 4| 25| 0,16| > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting- >guide.html >and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Qvalue package: I am getting back 1, 000 q values when I only want 1 q value.
What you're doing makes no sense. Given p-values p_i, i=1...n, resulting from hypothesis tests t_i, i=1...n, the q-value of p_i is the expected proportion of false positives among all n tests if the significance level of each test is α=p_i. Thus a q-value is only defined for an observed p-value. Assuming that you have stored n observed p-values in an R vector P, and the ith p-value P[i]==.05, then the R syntax to obtain the q-value for P[i] is qvalue(P)$qvalues[i]. If, instead (as I suspect), that .05 is not among your observed p-values, but you want to know what the FDR would be, given your sequence of p-values, if the significance level of every test were .05, then the R syntax would be max(qvalue(P)$qvalues[P<=.05]). On Fri, Jan 13, 2017 at 2:08 AM, Thomas Ryan wrote: > Jim, > > Thanks for the reply. Yes I'm just playing around with the data at the > minute, but regardless of where the p values actually come from, I can't > seem to get a Q value that makes sense. > > For example, in one case, I have an actual P value of 0.05. I have a list > of 1,000 randomised p values: range of these randomised p values is 0.002 > to 0.795, average of the randomised p values is 0.399 and the median of the > randomised p values is 0.45. > > So I thought it would be reasonable to expect the FDR Q Value (i.e the > number of expected false positives over the number of significant results) > to > be at least over 0.05, given that 869 of the randomised p values are > > 0.05? > > When I run the code: > > library(qvalue) > list1 <-scan("ListOfPValues") > > qobj <-qvalue(p=list1) > > qobj$pi0 > > > The answer is 0.0062. That's why I thought qobj$pi0 isn't the right > variable to be looking at? So my problem (or my mis-understanding) is that > I have an actual P value of 0.05, but then a Q value that is lower, 0.006? > > > Thanks again for your help, > > Tom > > > > > > > > > On Thu, Jan 12, 2017 at 9:27 PM, Jim Lemon wrote: > > > Hi Tom, > > From a quick scan of the docs, I think you are looking for qobj$pi0. > > The vector qobj$qvalue seems to be the local false discovery rate for > > each of your randomizations. Note that the manual implies that the p > > values are those of multiple comparisons within a data set, not > > randomizations of the data, so I'm not sure that your usage is valid > > for the function.. > > > > Jim > > > > > > On Fri, Jan 13, 2017 at 4:12 AM, Thomas Ryan > > wrote: > > > Hi all, I'm wondering if someone could put me on the right path to > using > > > the "qvalue" package correctly. > > > > > > I have an original p value from an analysis, and I've done 1,000 > > > randomisations of the data set. So I now have an original P value and > > 1,000 > > > random p values. I want to work out the false discovery rate (FDR) (Q; > as > > > described by Storey and Tibshriani in 2003) for my original p value, > > > defined as the number of expected false positives over the number of > > > significant results for my original P value. > > > > > > So, for my original P value, I want one Q value, that has been > calculated > > > as described above based on the 1,000 random p values. > > > > > > I wrote this code: > > > > > > pvals <- c(list_of_p_values_obtained_from_randomisations) > > > qobj <-qvalue(p=pvals) > > > r_output1 <- qobj$pvalue > > > r_output2 <- qobj$qvalue > > > > > > r_output1 is the list of 1,000 p values that I put in, and r_output2 is > > a q > > > value for each of those p values (i.e. so there are 1,000 q values). > > > > > > The problem is I don't want there to be 1,000 Q values (i.e one for > each > > > random p value). The Q value should be the false discovery rate (FDR) > > (Q), > > > defined as the number of expected false positives over the number of > > > significant results. So I want one Q value for my original P value, and > > to > > > calculate that one Q value using the 1,000 random P values I have > > generated. > > > > > > Could someone please tell me where I'm going wrong. > > > > > > Thanks > > > Tom > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do rea
[R] Setting fixed size for segement plot using stars() (axes size vs print size)
I have been making some segment plots with five variables. They work great, especially when I used a different scale function, which scaled them by area of the circle rather than radius scale <- function(x, Mr = 1 , Mx = 100) { ((x/Mx)^.5)*Mr} Where x is the the value, Mr is the Maximum radius, and Mx is the maximum data value. You could change the exponent .5 to .57 if you wanted Flannery compensation. My problem is that I want the print size of these proportional symbols to be the same regardless of the number of data points as in this example, where exporting these two plots as PDF(which have been scaled) will produce different size symbols for the same value, when compared side by side.I've tried manually setting the ncol and nrow attributes, and it still produces different results for the data sets. stars(large[2:6], draw.segments = TRUE, labels = large$size,scale = FALSE, flip.labels = TRUE, axes = TRUE,) stars(small[2:6], draw.segments = TRUE, labels = small$size,scale = FALSE, flip.labels = TRUE, axes = TRUE,) Thanks! small <- structure(list(size = c(5, 10, 15, 20, 25, 30, 50), one = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), two = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), three = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), four = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), five = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548)), .Names = c("size", "one", "two", "three", "four", "five"), row.names = c(NA, 7L), class = "data.frame") large <- structure(list(size = c(5L, 10L, 15L, 20L, 25L, 30L, 50L, 5L, 10L, 15L, 20L, 25L, 30L, 50L, 5L, 10L, 15L, 20L, 25L, 30L, 50L, 5L, 10L, 15L, 20L, 25L, 30L, 50L), one = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), two = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), three = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), four = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548), five = c(0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548, 0.223606797749979, 0.316227766016838, 0.387298334620742, 0.447213595499958, 0.5, 0.547722557505166, 0.707106781186548)), .Names = c("size", "one", "two", "three", "four", "five"), row.names = c(NA, -28L ), class = "data.frame") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] call perl
Hi, It may be the old question. can anyone tell me how to call perl in R? thanks Y. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading web log file into R
Sebastian, There is rarely a completely free lunch, but fortunately for us R has some wonderful tools to make this possible. R supports regular expressions with commands like grep(), gsub(), strsplit(), and others documented on the help pages. It's just a matter of constructing and algorithm that does the job. In your case, for example (though please note there are probably many different, completely reasonable approaches in R): x <- scan("logfilename", what="", sep="\n") should give you a vector of character strings, one line per element. Now, lines containing "GET" seem to identify interesting lines, so x <- x[grep("GET", x)] should trim it to only the interesting lines. If you want information from other lines, you'll have to treat them separately. Next, you might try y <- strsplit(x) which by default splits on whitespace, returning a list (one component per line) of vectors based on the split. Try it. It it looks good, you might check lapply(y, length) to see if all lines contain the same number of records. If so, you can then get quickly into a matrix, z <- matrix(unlist(strsplit(x)), ncol=K, byrow=TRUE) where K is the common length you just observed. If you think this is cool, great! If not, well... hire a programmer, or if you're lucky Microsoft or Apache have tools to help you with this. There might be something in the Perl/Python world. Or maybe there's a package in R designed just for this, but I encourage students to develop the raw skills... Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RMySQL_0.7-4 core dumped on dbWriteTable
Good Afternoon: Have an R script that uses RMySQL package. Everything works great within 32 bit ubuntu linux environment (/usr/sbin/mysqld: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped). > mysqlClientLibraryVersions() 5.1.41 5.1.37 50141 50137 Now testing on 64 bit ubuntu linux environment (/usr/sbin/mysqld:ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped). > mysqlClientLibraryVersions() 5.0.75 5.0.75 50075 50075 Followed instructions for RMySQL installation (specifying MySQL headers and library directories) export PKG_CPPFLAGS="-I/usr/include/mysql" export PKG_LIBS="-L/usr/lib/ -lmysqlclient" (This is where the '/usr/lib64/mysql' symbolic link ends up). Made sure I could successfully query and write to the database otherwise (with RODBC). So far, can successfully connect and disconnect using RMySQL Also, am able to execute dbGetQuery command. However, upon executing the dbWriteTable command (see partial .RHistory below), R crashes with "***buffer overflow detected***: /usr/lib64/R/bin/exec/R terminated" How can I fix this? Appreciate your help. Sincerely, Jay James Castino, PE Principal JJCENG.COM, PC www.jjceng.com +1 (541) 633-7990 1560 NE 1st ST. #14 Bend, OR USA 97701 ## partial .RHistory ### >sessionInfo() R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > library(RMySQL) Loading required package: DBI > con<- dbConnect(dbDriver("MySQL"), dbname = "knottlf_local",user="mysql", > password="xx", host="localhost") > ch4dd<-data.frame(6,"2010-03-06") > names(ch4dd)<-c("scada_terminal_id","timestamp_on") > ch4dd scada_terminal_id timestamp_on 1 6 2010-03-06 > dbGetQuery(con, "SELECT LAST_INSERT_ID() FROM > `knottlf_local`.`R_ch4_concentrations`") data frame with 0 columns and 0 rows > dbWriteTable(con, name = "R_ch4_concentrations",ch4dd, append = TRUE, > row.names = FALSE) *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated === Backtrace: = /lib/libc.so.6(__fortify_fail+0x37)[0x7fb4375292c7] /lib/libc.so.6[0x7fb437527170] /lib/libc.so.6[0x7fb437526519] /lib/libc.so.6(_IO_default_xsputn+0x96)[0x7fb4374a0426] /lib/libc.so.6(_IO_vfprintf+0x348d)[0x7fb437472e2d] /lib/libc.so.6(__vsprintf_chk+0x99)[0x7fb4375265b9] /lib/libc.so.6(__sprintf_chk+0x80)[0x7fb437526500] /home/jbiztino/R/x86_64-pc-linux-gnu-library/2.10/RMySQL/libs/RMySQL.so(RS_MySQL_exec+0x1be)[0x7fb4348630de] /usr/lib64/R/lib/libR.so[0x7fb437847ace] /usr/lib64/R/lib/libR.so(Rf_eval+0x6b6)[0x7fb437877ed6] /usr/lib64/R/lib/libR.so[0x7fb43787a0e0] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so[0x7fb43787a1ce] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2d3)[0x7fb43787ba93] /usr/lib64/R/lib/libR.so(Rf_eval+0x3c3)[0x7fb437877be3] /usr/lib64/R/lib/libR.so[0x7fb43787b39c] /usr/lib64/R/lib/libR.so(R_execMethod+0x241)[0x7fb43787b6d1] /usr/lib64/R/library/methods/libs/methods.so[0x7fb435259655] /usr/lib64/R/lib/libR.so[0x7fb4378c205c] /usr/lib64/R/lib/libR.so(Rf_eval+0x5dc)[0x7fb437877dfc] /usr/lib64/R/lib/libR.so[0x7fb43787802f] /usr/lib64/R/lib/libR.so(Rf_eval+0x26d)[0x7fb437877a8d] /usr/lib64/R/lib/libR.so(Rf_eval+0x64b)[0x7fb437877e6b] /usr/lib64/R/lib/libR.so[0x7fb43787802f] /usr/lib64/R/lib/libR.so(Rf_eval+0x26d)[0x7fb437877a8d] /usr/lib64/R/lib/libR.so(Rf_eval+0x64b)[0x7fb437877e6b] /usr/lib64/R/lib/libR.so[0x7fb437878f7d] /usr/lib64/R/lib/libR.so(Rf_eval+0x58e)[0x7fb437877dae] /usr/lib64/R/lib/libR.so[0x7fb43787a0e0] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so[0x7fb43787a1ce] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2d3)[0x7fb43787ba93] /usr/lib64/R/lib/libR.so(Rf_eval+0x3c3)[0x7fb437877be3] /usr/lib64/R/lib/libR.so[0x7fb43787ae41] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so[0x7fb43787a7e6] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so[0x7fb43787a1ce] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/libR.so[0x7fb43787a1ce] /usr/lib64/R/lib/libR.so(Rf_eval+0x46e)[0x7fb437877c8e] /usr/lib64/R/lib/lib
[R] ' R ' - General Question (newbie)
Hi, First-off, I apologize if this is the wrong list to post to, but I would like to install and try out 'R', as an alternative to 'SAS' . As a newbie, could you pl let me know about the following (in terms of online resources and print books) I have previously used SAS/BASE in a Biostatistics/ Epidemiology (Public Health) class, and familiar with very basic terminology and SAS-BASE use. 1) Basics of 'R' 2) Where to download & How to install it on Windows (XP), and any needed add-on modules (for Data Analysis and Biostatistics procedures) + others similar to ODS of SAS. 2) Any print/ online documentation for the beginning user of R. Thanks, Jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building a big.matrix using foreach
Michael, If you have a big.matrix, you just want to iterate over the rows. I'm not in R and am just making this up on the fly (from a bar in Beijing, if you believe that): foreach(i=1:nrow(x),.combine=c) %dopar% f(x[i,]) should work, essentially applying the functin f() to the rows of x? But perhaps I misunderstand you. Please feel free to email me or Mike ( michael.k...@yale.edu) directoy with questions about bigmemory, we are very interested in applications of it to real problems. Note that the package foreach uses package iterators, and is very flexible, in case you need more general iteration in parellel. Regards, Jay Original message: Hi there! I have become a big fan of the 'foreach' package allowing me to do a lot of stuff in parallel. For example, evaluating the function f on all elements in a vector x is easily accomplished: foreach(i=1:length(x),.combine=c) %dopar% f(x[i]) Here the .combine=c option tells foreach to combine output using the c()-function. That is, to return it as a vector. Today I discovered the 'bigmemory' package, and I would like to contruct a big.matrix in a parralel fashion row by row. To use foreach I see no other way than to come up with a substitute for c in the .combine option. I have checked out the big.matrix manual, but I can't find a function suitable for just that. Actually, I wouldn't even know how to do it for a usual matrix. Any clues? Thanks! -- Michael Knudsen micknud...@gmail.com http://lifeofknudsen.blogspot.com/ -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kmeans.big.matrix
This sort of question is ideal to send directly to the maintainer. We've removed kmeans.big.matrix for the time being and will place it in a new package, bigmemoryAnalytics. bigmemory itself is the core building block and tool, and we don't want to pollute it with lots of extras. Allan's point is right: big data packages (like bigmemory and ff) can't be used directly with R functions (like lm). And because of R's design you can't extract subsets with more than 2^31-1 elements, even though the big.matrix can be as large as you need (with filebacking). I hope that helps. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mosaic plots
As pointed out by others, vcd supports mosaic plots on top of the grid engine (which is extremely helpful for those of us who love playing around with grid). The standard mosaicplot() function is directly available (it isn't clear if you knew this). The proper display of names is a real challenge faced by all of us with these plots, so you should try each version. I'm not sure what you intend to do with a legend, but if you want the ability to customize and hack code, I suggest you look at grid and a modification to vcd's version to suit your purposes. Jay > > > > >> Subject: [R] Mosaic Plots >> Message-ID: <1269256874432-1677468.p...@n4.nabble.com> >> Content-Type: text/plain; charset=us-ascii >> >> >> Hello Everyone >> >> I want to plot Moasic Plots, I have tried them using iplots package (using >> imosaic). The problem is the names dont get alligned properly, is there a >> way to a align the names and provide legend in Mosaic plots using R? >> >> Also I would like to know any other packages using which I can plot Mosaic >> Plots >> >> >> Thank you in advance >> Sunita >> -- >> >> > -- > John W. Emerson (Jay) > Associate Professor of Statistics > Department of Statistics > Yale University > http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay> > -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] large dataset
A little more information would help, such as the number of columns? I imagine it must be large, because 100,000 rows isn't overwhelming. Second, does the read.csv() fail, or does it work but only after a long time? And third, how much RAM do you have available? R Core provides some guidelines in the Installation and Administration documentation that suggests that a single object around 10% of your RAM is reasonable, but beyond that things can become challenging, particularly once you start working with your data. There are a wide range of packages to help with large data sets. For example, RMySQL supports MySQL databases. At the other end of the spectrum, there are possibilities discussed on a nice page by Dirk Eddelbuettel which you might look at: http://cran.r-project.org/web/views/HighPerformanceComputing.html Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay (original message below) -- Message: 128 Date: Sat, 27 Mar 2010 10:19:33 +0100 From: "n\.vial...@libero\.it" To: "r-help" Subject: [R] large dataset Message-ID: Content-Type: text/plain; charset=iso-8859-1 Hi I have a question, as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... thanks a lot!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Huge data sets and RAM problems
Stella, A few brief words of advice: 1. Work through your code a line at a time, making sure that each is what you would expect. I think some of your later problems are a result of something early not being as expected. For example, if the read.delim() is in fact not giving you what you expect, stop there before moving onwards. I suspect some funny character(s) or character encodings might be a problem. 2. 32-bit Windows can be limiting. With 2 GB of RAM, you're probably not going to be able to work effectively in native R with objects over 200-300 MB, and the error indicates that something (you or a package you're using) simply have run out of memory. So... 3. Consider more RAM (and preferably with 64-bit R). Other solutions might be possible, such as using a database to hand the data transition into R. 2.5 million rows by 18 columns is apt to be around 360 MB. Although you can afford 1 (or a few) copies of this, it doesn't leave you much room for the memory overhead of working with such an object. Part of the oringal message below. Jay - Message: 80 Date: Mon, 19 Apr 2010 22:07:03 +0200 From: Stella Pachidi To: r-h...@stat.math.ethz.ch Subject: [R] Huge data sets and RAM problems Message-ID: Content-Type: text/plain; charset=ISO-8859-1 Dear all, I am using R 2.10.1 in a laptop with Windows 7 - 32bit system, 2GB RAM and CPU Intel Core Duo 2GHz. . Finally, another problem I have is when I perform association mining on the data set using the package arules: I turn the data frame into transactions table and then run the apriori algorithm. When I put too low support in order to manage to find the rules I need, the vector of rules becomes too big and I get problems with the memory such as: Error: cannot allocate vector of size 923.1 Mb In addition: Warning messages: 1: In items(x) : Reached total allocation of 153Mb: see help(memory.size) Could you please help me with how I could allocate more RAM? Or, do you think there is a way to process the data by loading them into a document instead of loading all into RAM? Do you know how I could manage to read all my data set? I would really appreciate your help. Kind regards, Stella Pachidi -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory package woes
Zerdna, Please note that the CRAN version 3.12 is about to be replaced by a new cluster of packages now on R-Forge; we consider the new bigmemory >= 4.0 to be "stable" and recommend you start using it immediately. Please see http://www.bigmemory.org. In your case, two comments: (1) Your for() loop will generate three identical copies of filebackings on disk, yes. Note that when the loop exists, the R object xx will reference only the 3rd of these, so xx[1,1] <- 1 will modify only the third filebacking, not the first two. You'll need to use the separate descriptor files (probably created automatically for you, but we recommend naming them specifically using descriptorfile=) to attach.big.matrix() whatever of these you really want to be using. (2) In the problem with "hanging" I believe you have exhausted the shared resources on your system. This problem will no longer arise in the >= 4.0 problems, as we're handling mutexes separately rather than automatically. These shared resource limits are mysterious, depending on the OS as well as the hardware and other jobs or tasks in existence at any given point in time. But again, it shouldn't be a problem with the new version. The CRAN update should take place early next week, along with some revised documentation. Regards, Jay --- Message: 125 Date: Fri, 23 Apr 2010 13:51:32 -0800 (PST) From: zerdna To: r-help@r-project.org Subject: [R] bigmemory package woes Message-ID: <1272059492009-2062996.p...@n4.nabble.com> Content-Type: text/plain; charset=us-ascii I have pretty big data sizes, like matrices of .5 to 1.5GB so once i need to juggle several of them i am in need of disk cache. I am trying to use bigmemory package but getting problems that are hard to understand. I am getting seg faults and machine just hanging. I work by the way on Red Hat Linux, 64 bit R version 10. Simplest problem is just saving matrices. When i do something like r<-matrix(rnorm(100), nr=10); librarybigmemory) for(i in 1:3) xx<-as.big.matrix(r, backingfile=paste("r",i, sep="", collapse=""), backingpath=MyDirName) it works just fine -- saves small matrices as three different matrices on disc. However, when i try it with real size, like with r<-matrix(normal(5000), nr=1000) I am either getting seg fault on saving the third big matrix, or hang forever. Am i doing something obviously wrong, or is it an unstable package at the moment? Could anyone recommend something similar that is reliable in this case? -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help producing plot for assessing forecasting accuracy
Dear colleagues, I'm trying (and failing) to write the script required to generate a chart that would help me assess the forecasting accuracy of a logistic regression model by plotting the cumulative proportion of observed events occurring in cases across the range of possible predicted probabilities. In other words, let: x = any value on 0-1 scale phat_i = predicted probability of event Y from logit model for case i y_i = observed outcome (0/1) for case i Y_cond = sum(y_i) conditional on phat_i <= x Y_tot = total number of events observed in sample What I'm trying to plot is (Y_cond)/(Y_tot) across all values of x. I would be grateful for any guidance you can offer, and I'm sorry if I've overlooked some really simple solution; I'm fairly new to R and learning by doing. Regards, Jay -- Jay Ulfelder, Ph.D. Research Director Political Instability Task Force Science Applications International Corp. (SAIC) jay_ulfel...@stanfordalumni.org (301) 588-8478 [home office] (301) 580-8736 [mobile] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimation in a changepoint regression with R
Package bcp does Bayesian changepoint analysis, though not in the general regression framework. The most recent reference is Bioinformatics 24(19) 2143-2148; doi: 10.1093/bioinformatics/btn404; slightly older is JSS 23(3). Both reference some alternatives you might want to consider (including strucchange, among others). Jay Message: 4 Date: Thu, 15 Oct 2009 03:56:22 -0700 (PDT) From: FMH Subject: [R] Estimation in a changepoint regression with R To: r-help@r-project.org Message-ID: <365399.56401...@web38303.mail.mud.yahoo.com> Content-Type: text/plain; charset=iso-8859-1 Dear All, I'm trying to do the estimation in a changepoint regression problem via R, but never found any suitable function which might help me to do this. Could someone give me a hand?on this matter? Thank you. -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multicore package: sharing/modifying variable accross processes
Renaud, Package bigmemory can help you with shared-memory matrices, either in RAM or filebacked. Mutex support currently exists as part of the package, although for various reasons will soon be abstracted from the package and provided via a new package, synchronicity. bigmemory works beautifully with multicore. Feel free to email us with questions, and we appreciate feedback. Jay Original message: Hi, I want to parallelize some computations when it's possible on multicore machines. Each computation produces a big objects that I don't want to store if not necessary: in the end only the object that best fits my data have to be returned. In non-parallel mode, a single gloabl object is updated if the current computation gets a better result than the best previously found. My plan was to use package multicore. But there is obviously an issue of concurrent access to the global result variable. Is there a way to implement something like a lock/mutex to ensure make the procedure thread safe? Maybe something already exist to deal with such things? It looks like package multicore run the different processes in different environments with copy-on-change of everything when forking. Anybody has experimented working with a shared environment with package multicore? -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory - extracting submatrix from big.matrix object
Thanks for trying this out. Problem 1. We'll check this. Options should certainly be available. Thanks! Problem 2. Fascinating. We just (yesterday) implemented a sub.big.matrix() function doing exactly this, creating something that is a big matrix but which just references a contiguous subset of the original matrix. This will be available in an upcoming version (hopefully in the next week). A more specialized function would create an entirely new big.matrix from a subset of a first big.matrix, making an actual copy, but this is something else altogether. You could do this entirely within R without much work, by the way, and only 2* memory overhead. Problem 3. You can count missing values using mwhich(). For other exploration (e.g. skewness) at the moment you should just extract a single column (variable) at a time into R, study it, then get the next column, etc... . We will not be implementing all of R's functions directly with big.matrix objects. We will be creating a new package "bigmemoryAnalytics" and would welcome contributions to the package. Feel free to email us directly with bugs, questions, etc... Cheers, Jay -- From: utkarshsinghal Date: Tue, Jun 2, 2009 at 8:25 AM Subject: [R] bigmemory - extracting submatrix from big.matrix object To: r help I am using the library(bigmemory) to handle large datasets, say 1 GB, and facing following problems. Any hints from anybody can be helpful. _Problem-1: _ I am using "read.big.matrix" function to create a filebacked big matrix of my data and get the following warning: > x = > read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile > = "backup", backingpath = "/home/utkarsh.s") Warning message: In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type, : A descriptor file has not been specified. A descriptor named backup.desc will be created. However there is no such argument in "read.big.matrix". Although there is an argument "descriptorfile" in the function "as.big.matrix" but if I try to use it in "read.big.matrix", I get an error showing it as unused argument (as expected). _Problem-2:_ I want to get a filebacked *sub*matrix of "x", say only selected columns: x[, 1:100]. Is there any way of doing that without actually loading the data into R memory. _ Problem-3 _There are functions available like: summary, colmean, colsd, ... for standard summary statistics. But is there any way to calculate other summaries say number of missing values or skewness of each variable, without loading the whole data into R memory. Regards Utkarsh -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory - extracting submatrix from big.matrix object
Utkarsh, Thanks again for the feedback and suggestions on bigmemory. A follow-up on counting NAs: we have exposed a new function colna() to the user in upcoming release 3.7. Of course mwhich() can still be helpful. As for the last topic -- applying any function to columns of a big.matrix object. Once you peel away the shell, a big.matrix column is identical to an R matrix column (or vector) -- a pointer and a length and knowledge of the type is sufficient. Because we (ideally) want to support our current 4 types (and hopefully add complex and maybe more, soon), we rely on C++ template functions for the summaries we have implemented to date. But yes, looking at our implementation of colmean(), for example, would be a good place to start. Keep in mind that there are differences between big.matrix objects and R internals. bigmemory indexes everything using longs instead of integers (and uses numerics when passing indices between R and C/C++). So simply using an existing R function (or the C function under the hood of R) would be limiting not only with respect to the various types of big.matrix objects, but also with respect to the size. On 64-bit R platforms, there is no practical limit to the size of a filebacked big.matrix (other than your disk space or filesystem limitations). But R won't handle vectors in excess of 2 billion elements, even if you have the RAM to support such beasts. Operating on chunks within R is of course another possibility. Further discussion of development ideas would be great, but should probably be moved offine or over to R-devel. As always, we appreciate feedback, complaints, bug reports, etc... Thanks, Jay On Wed, Jun 3, 2009 at 3:16 AM, utkarshsinghal wrote: > Thanks for the really valuable inputs, developing the package and updating > it regularly. I will be glad if I can contribute in any way. > > In problem three, however, I am interested in knowing a generic way to apply > any function on columns of a big.matrix object (obviously without loading > the data into R). May be the source code of the function "colmean" can help, > if that is not too much to ask for. Or if we can develop a function similar > to "apply" of the base R. > > > Regards > Utkarsh > > > > > Jay Emerson wrote: >> >> We also have ColCountNA(), which is not currently exposed to the user >> but will be in the next version. >> >> Jay >> >> On Tue, Jun 2, 2009 at 2:08 PM, Jay Emerson wrote: >> >>> >>> Thanks for trying this out. >>> >>> Problem 1. We'll check this. Options should certainly be available. >>> Thanks! >>> >>> Problem 2. Fascinating. We just (yesterday) implemented a >>> sub.big.matrix() function doing exactly >>> this, creating something that is a big matrix but which just >>> references a contiguous subset of the >>> original matrix. This will be available in an upcoming version >>> (hopefully in the next week). A more >>> specialized function would create an entirely new big.matrix from a >>> subset of a first big.matrix, >>> making an actual copy, but this is something else altogether. You >>> could do this entirely within R >>> without much work, by the way, and only 2* memory overhead. >>> >>> Problem 3. You can count missing values using mwhich(). For other >>> exploration (e.g. skewness) >>> at the moment you should just extract a single column (variable) at a >>> time into R, study it, then get the >>> next column, etc... . We will not be implementing all of R's >>> functions directly with big.matrix objects. >>> We will be creating a new package "bigmemoryAnalytics" and would >>> welcome contributions to the >>> package. >>> >>> Feel free to email us directly with bugs, questions, etc... >>> >>> Cheers, >>> >>> Jay >>> >>> >>> -- >>> >>> From: utkarshsinghal >>> Date: Tue, Jun 2, 2009 at 8:25 AM >>> Subject: [R] bigmemory - extracting submatrix from big.matrix object >>> To: r help >>> I am using the library(bigmemory) to handle large datasets, say 1 GB, >>> and facing following problems. Any hints from anybody can be helpful. >>> _Problem-1: >>> _ >>> I am using "read.big.matrix" function to create a filebacked big >>> matrix of my data and get the following warning: >>> >>>> >>>> x = >>>> read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,ba
[R] [R-pkgs] Major bigmemory revision released.
The re-engineered bigmemory package is now available (Version 3.5 and above) on CRAN. We strongly recommend you cease using the older versions at this point. bigmemory now offers completely platform-independent support for the big.matrix class in shared memory and, optionally, as filebacked matrices for larger-than-RAM applications. We're working on updating the package vignette, and a draft is available upon request (just send me an email if you're interested). The user interface is largely unchanged. Feedback, bug reports, etc... are welcome. Jay Emerson & Michael Kane -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bivariate pdf and marginal pdf of non-standard distributions
Hi, I am trying to find marginal pdfs when joint pdf of two continuous variables is given. Simple case is like this: The joint pdf is f(x,y) = 4xy, x lies between 0 and 1, and y also lies between 0 and 1. I tried following crude way of doing it but it is not complete. Could you post a suggestion to it. Thanks, Jay Liu, University of Tennesse at Knoxville ## # f(x,y) = 4xy, range of x is (0,1), range of y is (0,1) # Checking if f (x,y) is a joint pdf Pxy<-integrate(function(y) { sapply(y, function(y) { integrate(function(x) { sapply(x, function(x) (4*x*y)) }, 0, 1)$value }) }, 0, 1) print("Value of int int f(x,y)dx dy") Pxy # To find marginal distribution, I tried this, but this is incorrect because x is not all considered while integrating the joint pdf # Assume x to be a constant # Px1 = f(x) Px1<-integrate(function(y) { 4*y }, 0, 1)$value [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore
>>> See inline for responses. But people are always welcome to contact >>> us directly. Hi all, I'm on a Linux server with 48Gb RAM. I did the following: x <- big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50)) #Gets around the 2^31 issue - yeah! >>> We strongly discourage use of dimnames. in Unix, when I hit the "top" command, I see R is taking up about 18Gb RAM, even though the object x is 0 bytes in R. That's fine: that's how bigmemory is supposed to work I guess. My question is how do I return that RAM to the system once I don't want to use x any more? E.g., rm(x) then "top" in Unix, I expect that my RAM footprint is back ~0, but it remains at 18Gb. How do I return RAM to the system? >>> It can take a while for the OS to free up memory, even after a gc(). >>> But it's available for re-use; if you want to be really sure, have a look >>> in /dev/shm to make sure the shared memory segments have been >>> deleted. Thanks, Matt -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stats Package Fix
Good afternoon, Using this file, with the tab for Growth_Value (I have pasted some code below), if I add the argument *exact = F *it produces the Wilcoxon signed rank test with continuity correction. The description documentation leads me to believe that's the wrong argument (should be the "correct" not the "exact" argument. I am using R Studio Version 1.2.1335 running R 3.6.1 This was posed in our Business Statistics Class, taught at Utah Valley University, of which I am a senior undergraduate. A response by tomorrow would be ideal, but I would still like an answer even if that timeline is too aggressive. Thank you for your consideration. My contact information is: Jay Spencer Waldron Personal Email (This is the one I am subscribed to R-Help with): gateofti...@gmail.com (385) 335-7879 #---Code paste begins--- > library(readxl) > Growth_Value <- read_excel("2019 SUMMER/ECON-3340-003/Ch20/Chapter20.xlsx", + sheet = "Growth_Value") > View(Growth_Value) > wilcox.test(Growth_Value$'Growth', mu=5, alternative="greater") Wilcoxon signed rank test data: Growth_Value$Growth V = 40, p-value = 0.1162 alternative hypothesis: true location is greater than 5 > > > wilcox.test(Growth_Value$'Growth', Growth_Value$'Value', alternative="two.sided", paired=TRUE) Wilcoxon signed rank test data: Growth_Value$Growth and Growth_Value$Value V = 40, p-value = 0.2324 alternative hypothesis: true location shift is not equal to 0 > wilcox.test(Growth_Value$'Growth', mu=5, alternative="greater", exact=F) Wilcoxon signed rank test with continuity correction data: Growth_Value$Growth V = 40, p-value = 0.1106 alternative hypothesis: true location is greater than 5 #---Code Paste Ends--- *Documentation referenced* *exact a logical indicating whether an exact p-value should be computed.correct a logical indicating whether to apply continuity correction in the normal approximation for the p-value.* Thanks you for your time, Jay Educational Email: 10809...@my.uvu.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-statistic for independent samples
Dear David, On Wed, Apr 17, 2013 at 6:24 PM, David Arnold wrote: > Hi, [snip] > > D. Before posting to StackExchange, check out the Wikipedia entry for "Behrens-Fisher problem". Cheers, Jay -- G. Jay Kerns, Ph.D. Youngstown State University http://people.ysu.edu/~gkerns/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] practical to loop over 2million rows?
New to R and having issues with loops. I am aware that I should use vectorization whenever possible and use the apply functions, however, sometimes a loop seems necessary. I have a data set of 2 million rows and have tried run a couple of loops of varying complexity to test efficiency. If I do a very simple loop such as add every item in a column I get an answer quickly. If I use a nested ifelse statement in a loop it takes me 13 minutes to get an answer on just 50,000 rows. I am aware of a few methods to speed up loops. Preallocating memory space and compute as much outside of the loop as possible (or use create functions and just loop over the function) but it seems that even with these speed ups I might have too much data to run loops. Here is the loop I ran that took 13 minutes. I realize I can accomplish the same goal using vectorization (and in fact did so). y<-numeric(length(x)) for(i in 1:length(x)) ifelse(!is.na(x[i]), y[i]<-x[i], ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1])) Presumably, complicated loops would be more intensive than the nested if statement above. If I write more efficient loops time will come down but I wonder if I will ever be able to write efficient enough code to perform a complicated loop over 2 million rows in a reasonable time. Is it useless for me to try to do any complicated loops on 2 million rows, or if I get much better at programming in R will it be manageable even for complicated situations? Jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error msg using na.approx "x and index must have the same length"
Below I have written out some simplified data from my dataset. My goal is to interpolate Price based on timestamp. Therefore the closer a Price is in time to another price, the more like that price it will be. I want the interpolations for each St and not across St (St is a factor with levels A, B, and C). Unfortunately, I get error messages from code I wrote. In the end only IDs 10 and 14 will receive interpolated values because all other NAs occur at the beginning of a level. My code is given below the dataset. ID is int St is factor with 3 levels timestamp is POSIXlt Price is num Data.frame name is portfolio ID St timestamp Price 1 A2012-01-01 12:50:24.760 NA 2 A2012-01-01 12:51:25.860 72.09 3 A2012-01-01 12:52:21.613 72.09 4 A2012-01-01 12:52:42.010 75.30 5 A2012-01-01 12:52:42.113 75.30 6 B2012-01-01 12:56:20.893 NA 7 B2012-01-01 12:56:46.02367.70 8 B2012-01-01 12:57:19.30076.06 9 B2012-01-01 12:58:20.75077.85 10 B2012-01-01 12:58:20.797 NA 11 B2012-01-01 12:59:19.52779.57 12 C2012-01-01 13:00:21.84781.53 13 C2012-01-01 13:00:21.86081.53 14 C2012-01-01 13:00:21.873 NA 15 C2012-01-01 13:00:43.49384.69 16 D2012-01-01 12:01:21.52024.63 17 D2012-01-01 12:02:18.88021.13 I tried the following using na.approx from zoo package interpolatedPrice<-unlist(tapply(portfolio$Price, portfolio$St, na.approx, portfolio$timestamp, na.rm=FALSE)) but keep getting error "Error in na.approx.default(X[[1L]], ...) : x and index must have the same length" I checked the length of every variable in the formula and they all have the same length so I am not sure why I get the error message. Jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] First time Rcpp user needs compiler help, I think
I am trying to write a C++ function to be called from R and have never done this before. I have done the following: require(Rcpp) require(inline) src <- 'blahblahblah' fun <- cxxfunction(signature(a="numeric",b="numeric"),src,plugin="Rcpp") That last line generates the error message: Error in compileCode(f, code, language = language, verbose = verbose) : Compilation ERROR, function(s)/method(s) not created! In addition: Warning message: running command 'C:/PROGRA~1/R/R-215~1.1/bin/i386/R CMD SHLIB file26476eb7fd0.cpp 2> file26476eb7fd0.cpp.txt' had status 1 I am assuming that I don't have the necessary compiler installed on my machine. I tried installing Rtools, which created a C:/Rtools/ directory with the files, but I am not sure if my R software knows how to properly access it. I was advised to make sure my path is set correctly, but wasn't sure which path--my library search path in R, my Rtools installation setting for "PATH", my "PATH" in DOS, or what. I am operating on Windows 7. I would greatly appreciate some instructions on how to set this up. All the manuals and posts seem to gloss over the details of setting up the C++ compiler (I have looked at The Art of R Programming, and also the R-admin.pdf file, among other sources). I assume it is something that can be done in 5 minutes, if I only knew how ... Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parallel computing on Windows (foreach) (Sergey Goriatchev)
foreach (or virtually anything you might use for concurrent programming) only really makes sense if the work the "clients" are doing is substantial enough to overwhelm the communication overhead. And there are many ways to accomplish the same task more or less efficiently (for example, doing blocks of tasks in chunks rather than passing each one as an individual job). But more to the point, doSNOW works just fine on an SMP, no problem, it doesn't require a cluster. Jay <> Not only is the sequential foreach much slower than the simple for-loop (as least in this particular instance), but I am not quite sure how to make foreach run parallel. Where would I get this parallel backend? I looked at doMC and doRedis, but these do not run on Windows, as far as I understand. And doSNOW is something to use when you have a cluster, while I have a simple dual-core PC. It is not really clear for how to make parallel computing work. Please, help. Regards, Sergey -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (help) This is an R workspace memory processing question
You should look at packages like ff, bigmemory, RMySQL, and so on. However, you should really consider moving to a different platform for large-data work (Linux, Mac, or Windows 7 64-bit). Jay - This is an R workspace memory processing question. There is a method which from the R will control 10GB data at 500MB units? my work environment : R version : 2.11.1 OS : WinXP Pro sp3 -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error installing tkrplot
Hi, I've been trying to install tkrplot and have been coming across this error. Loading required package: tcltk Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details: call: dyn.load(file, DLLpath = DLLpath, ...) error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so': dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/i386/tcltk.so Reason: image not found Error: package 'tcltk' could not be loaded I have been clicking on 'tkrplot' in 'R Package manager', yet I get this error saying that I'm trying to load the package above that in the list 'tcltk'. Anybody know why this is happening. Is there another way to load it? Thanks, James -- View this message in context: http://r.789695.n4.nabble.com/Error-installing-tkrplot-tp2292646p2292646.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction plot for logistic regression output
How do I construct a figure showing predicted value plots for the dependent variable as a function of each explanatory variable (separately) using the results of a logistic regression? It would also be helpful to know how to show uncertainty in the prediction (95% CI or SE). Thanks- This email has been processed by SmoothZap - www.smoothwall.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging and working with big data sets
I can't speak for ff and filehash, but bigmemory's data structure doesn't allow "clever" merges (for actually good reasons). However, it is still probably less painful (and faster) than other options, though we don't implement it: we leave it to the user because details may vary depending on the example and the code is trivial. - Allocate an empty new filebacked big.matrix of the proper size. - Fill it in chunks (typically a column at a time if you can afford the RAM overhead, or a portion of a column at a time). Column operations are more efficient than row operations (again, because of the internals of the data structure). - Because you'll be using filebackings, RAM limitations won't matter other than the overhead of copying each chunk. I should note: if you used separated=TRUE, each column would have a separate binary file, and a "smart" cbind() would be possible simply by manipulating the descriptor file. Again, not something we advise or formally provide, but it wouldn't be hard. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] big data and lmer
Though bigmemory, ff, and other big data solutions (databases, etc...) can help easily manage massive data, their data objects are not natively compatible with all the advanced functionality of R. Exceptions include lm and glm (both ff and bigmemory support his via Lumley's biglm package), kmeans, and perhaps a few other things. In many cases, it's just a matter of someone deciding to port a tool/analysis of interest to one of these different object types -- we welcome collaborators and would be happy to offer advice if you want to adapt something for bigmemory structures! Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: adding more columns in big.matrix object of bigmemory package]
For good reasons (having to do with avoiding copies of massive things) we leave such merging to the user: create a new filebacking of the proper size, and fill it (likely a column at a time, assuming you have enough RAM to support that). Jay On Fri, Dec 17, 2010 at 2:16 AM, utkarshsinghal wrote: > > Hi, > > With reference to the mail below, I have large datasets, coming from various > different sources, which I can read into filebacked big.matrix using library > bigmemory. I want to merge them all into one 'big.matrix' object. (Later, I > want to run regression using library 'biglm'). > > I am unsuccessfully trying to do this from quite some time now. Can you > please suggest some way? Am I missing some already available function? > > Even a functionality of the following will work for me: > > Just appending more columns in an existing big.matrix object (not merging). > If the individual datasets are small enough to be read in usual R, just the > combined dataset is huge. > > Any thoughts are welcome. > > Thanks, > Utkarsh > > > Original Message > Subject: adding more columns in big.matrix object of bigmemory package > Date: Thu, 16 Dec 2010 18:29:38 +0530 > From: utkarshsinghal > To: r help > > Hi all, > > Is there any way I can add more columns to an existing filebacked big.matrix > object. > > In general, I want a way to modify an existing big.matrix object, i.e., add > rows/columns, rename colnames, etc. > I tried the following: > > > library(bigmemory) > > x = > > read.big.matrix("test.csv",header=T,type="double",shared=T,backingfile="test.backup",descriptorfile="test.desc") > > x[,"v4"] = "new" > Error in mmap(j, colnames(x)) : > Couldn't find a match to one of the arguments. > (The above functionality is presently there in usual data.frames in R.) > > > Thanks in advance, > Utkarsh > -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bigmemory: Error Running Example
It seems very likely you are working on a 32-bit version of R, but it's a little surprising still that you would have a problem with any single year. Please tell us the operating system and version of R. Did you preprocess the airline CSV file using the utilities provided on bigmemory.org? If you don't, then anything character will be converted to NA. Is your R environment empty, or did you have other objects in memory? It might help to just do some tests yourself: x <- big.matrix(nrow=100, ncol=10, ... other options .) Make sure it works, then increase the size until you get a failure. This sort of exercise is extremely helpful in situations like this. Jay Subject: [R] Bigmemory: Error Running Example Message-ID: > Content-Type: text/plain Hi, I am trying to run the bigmemory example provided on the http://www.bigmemory.org/ The example runs on the "airline data" and generates summary of the csv files:- library(bigmemory) library(biganalytics) x <- read.big.matrix("2005.csv", type="integer", header=TRUE, backingfile="airline.bin", descriptorfile="airline.desc", extraCols="Age") summary(x) This runs fine for the provided csv for year 1987 (size=121MB). However, for big files like for year 2005 (size=639MB), it gives following errors:- Error in filebacked.big.matrix(nrow = nrow, ncol = ncol, type = type, : Problem creating filebacked matrix. Error: object 'x' not found Error in summary(x) : error in evaluating the argument 'object' in selecting a method for function 'summary' Here is the output from running the memory.limit() :- [1] 2047 Here is the output from running the memory.profile() :- NULL symbolpairlist closure environment promise 19381 3255706477 7443710 language special builtinchar logical integer 121940 1781600 1506895188981 double complex character ... anylist 7983 17 47593 0 04073 expressionbytecode externalptr weakref raw S4 2 0 618 117 1191838 Anyone who has previously worked with bigmemory before could throw some light on it. Were you able to run the examples successfully? Thanks in advance. Harsh Yadav -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay <http://www.stat.yale.edu/%7Ejay> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DNA sequence Fst
Hi, I want to analyse DNA sequence data (mtDNA) in R as in calculate Fst, Heterozygosity and such summary statistics. Package Adagenet converts the DNA sequence into retaining only retaining the polymorphic sites and then calcuates Fst.. but is there any other way to do this? I mean analyse the DNA sequence as it is.. and calculate the statistics? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory doubt
By far the easiest way to achieve this would be to use the bigmemory C++ structures in your program itself. However, if you do something on your own (but fundamentally have a column-major matrix in shared memory), it should be possible to play around with the pointer with R/bigmemory to accomplish this, yes. Feel free to email us directly for advice. Jay > Message: 153 > Date: Wed, 8 Sep 2010 10:52:19 +0530 (IST) > From: "raje...@cse.iitm.ac.in" > To: r-help > Subject: [R] bigmemory doubt > Message-ID: > <1204692515.13855.1283923339865.javamail.r...@mail.cse.iitm.ac.in> > Content-Type: text/plain > > Hi, > Is it possible for me to read data from shared memory created by a vc++ > program into R using bigmemory? -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] "Bayesian change point" package bcp 2.2.0 available
Version 2.2.0 of package bcp is now available. It replaces the suggests of NetWorkSpaces (previously used for optional parallel MCMC) with the dependency on package foreach, giving greater flexibility and supporting a wider range of parallel backends (see doSNOW, doMC, etc...). For those unfamiliar with foreach (thanks to Steve Weston for this contribution), it's a beautiful and highly portable looping construct which can run sequentially or in parallel based on the user's actions (rather than the programmer's choices). We think other package authors might want to consider taking advantage of it for tasks that might be computationally intensive and could be easily done in parallel. Some vignettes are available at http://cran.r-project.org/web/packages/foreach/index.html. Jay Emerson & Chandra Erdman -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Bayesian change point" package bcp 2.2.0 available
Version 2.2.0 of package bcp is now available. It replaces the suggests of NetWorkSpaces (previously used for optional parallel MCMC) with the dependency on package foreach, giving greater flexibility and supporting a wider range of parallel backends (see doSNOW, doMC, etc...). For those unfamiliar with foreach (thanks to Steve Weston for this contribution), it's a beautiful and highly portable looping construct which can run sequentially or in parallel based on the user's actions (rather than the programmer's choices). We think other package authors might want to consider taking advantage of it for tasks that might be computationally intensive and could be easily done in parallel. Some vignettes are available at http://cran.r-project.org/web/packages/foreach/index.html. Jay Emerson & Chandra Erdman (Apologies, the first version of this announcement was not plain-text.) -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] bigmemory 4.2.3
The long-promised revision to bigmemory has arrived, with package 4.2.3 now on CRAN. The mutexes (locks) have been extracted and will be available through package synchronicity (on R-Forge, soon to appear on CRAN). Initial versions of packages biganalytics and bigtabulate are on CRAN, and new versions which resolve the warnings and have streamlined CRAN-friendly configurations will appear shortly. Package bigalgebra will remain on R-Forge for the time being as the user-interface is developed and the configuration possibilities expand. For more information, please feel free to email us or visit http://www.bigmemory.org/. Jay Emerson & Mike Kane -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm without intercept
No, this is a cute problem, though: the definition of R^2 changes without the intercept, because the "empty" model used for calculating the total sums of squares is always predicting 0 (so the total sums of squares are sums of squares of the observations themselves, without centering around the sample mean). Your interpretation of the p-value for the intercept in the first model is also backwards: 0.9535 is extremely weak evidence against the hypothesis that the intercept is 0. That is, the intercept might be near zero, but could also be something veru different. With a standard error of 229, your 95% confidence interval for the intercept (if you trusted it based on other things) would have a margin of error of well over 400. If you told me that an intercept of, say 350 or 400 were consistent with your knowledge of the problem, I wouldn't blink. This is a very small data set: if you sent an R command such as: x <- c(x1, x2, ..., xn) y <- c(y1, y2, ..., yn) you might even get some more interesting feedback. One of the many good intro stats textbooks might also be helpful as you get up to speed. Jay - Original post: Message: 135 Date: Fri, 18 Feb 2011 11:49:41 +0100 From: Jan To: "R-help@r-project.org list" Subject: [R] lm without intercept Message-ID: <1298026181.2847.19.camel@jan-laptop> Content-Type: text/plain; charset="UTF-8" Hi, I am not a statistics expert, so I have this question. A linear model gives me the following summary: Call: lm(formula = N ~ N_alt) Residuals: Min 1Q Median 3Q Max -110.30 -35.80 -22.77 38.07 122.76 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.5177 229.0764 0.059 0.9535 N_alt 0.2832 0.1501 1.886 0.0739 . --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 56.77 on 20 degrees of freedom (16 observations deleted due to missingness) Multiple R-squared: 0.151, Adjusted R-squared: 0.1086 F-statistic: 3.558 on 1 and 20 DF, p-value: 0.07386 The regression is not very good (high p-value, low R-squared). The Pr value for the intercept seems to indicate that it is zero with a very high probability (95.35%). So I repeat the regression forcing the intercept to zero: Call: lm(formula = N ~ N_alt - 1) Residuals: Min 1Q Median 3Q Max -110.11 -36.35 -22.13 38.59 123.23 Coefficients: Estimate Std. Error t value Pr(>|t|) N_alt 0.292046 0.007742 37.72 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 55.41 on 21 degrees of freedom (16 observations deleted due to missingness) Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848 F-statistic: 1423 on 1 and 21 DF, p-value: < 2.2e-16 1. Is my interpretation correct? 2. Is it possible that just by forcing the intercept to become zero, a bad regression becomes an extremely good one? 3. Why doesn't lm suggest a value of zero (or near zero) by itself if the regression is so much better with it? Please excuse my ignorance. Jan Rheinl?nder -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kolmogorov-smirnov test
Taylor Arnold and I have developed a package ks.test (available on R-Forge in beta version) that modifies stats::ks.test to handle discrete null distributions for one-sample tests. We also have a draft of a paper we could provide (email us). The package uses methodology of Conover (1972) and Gleser (1985) to provide exact p-values. It also corrects an algorithmic problem with stats::ks.test in the calculation of the test statistic. This is not a bug, per se, because it was never intended to be used this way. We will submit this new function for inclusion in package stats once we're done testing. So, for example: # With the default ks.test (ouch): > stats::ks.test(c(0,1), ecdf(c(0,1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0.5, p-value = 0.5 alternative hypothesis: two-sided # With our new function (what you would want in this toy example): > ks.test::ks.test(c(0,1), ecdf(c(0,1))) One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0, p-value = 1 alternative hypothesis: two-sided Original Message: Date: Mon, 28 Feb 2011 21:31:26 +1100 From: Glen Barnett To: tsippel Cc: r-help@r-project.org Subject: Re: [R] Kolmogorov-smirnov test Message-ID: Content-Type: text/plain; charset=ISO-8859-1 It's designed for continuous distributions. See the first sentence here: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test K-S is conservative on discrete distributions On Sat, Feb 19, 2011 at 1:52 PM, tsippel wrote: > Is the kolmogorov-smirnov test valid on both continuous and discrete data? > ?I don't think so, and the example below helped me understand why. > > A suggestion on testing the discrete data would be appreciated. > > Thanks, -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exception while using NeweyWest function with doMC
Simon, Though we're please to see another use of bigmemory, it really isn't clear that it is gaining you anything in your example; anything like as.big.matrix(matrix(...)) still consumes full RAM for both the inner matrix() and the new big.matrix -- is the filebacking really necessary. It also doesn't appear that you are making use of shared memory, so I'm unsure what the gains are. However, I don't have any particular insight as to the subsequent problem with NeweyWest (which doesn't seem to be using the big.matrix objects). Jay -- Message: 32 Date: Sat, 27 Aug 2011 21:37:55 +0200 From: Simon Zehnder To: r-help@r-project.org Subject: [R] Exception while using NeweyWest function with doMC Message-ID: Content-Type: text/plain Dear R users, I am using R right now for a simulation of a model that needs a lot of memory. Therefore I use the *bigmemory* package and - to make it faster - the *doMC* package. See my code posted on http://pastebin.com/dFRGdNrG < snip > ----- -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installation of bigmemory fails
Premal, Package authors generally welcome direct emails. We've been away from this project since the release of 2.13.0 and I only just noticed the build errors. These generally occur because of some (usually small and solvable) problem with compilers and the BOOST libraries. We'll look at it and see what we can do. Please email us if you don't hear back in the next week or so. Thanks, Jay --- Hello All, I tried to intall the bigmemory package from a CRAN mirror site and received the following output while installing. Any idea what's going on and how to fix it? The system details are provided below. - begin error messages --- * installing *source* package 'bigmemory' ... checking for Sun Studio compiler...no checking for Darwin...yes ** libs g++45 -I/usr/local/lib/R/include -I../inst/include -fpic -O2 -fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c B\ igMatrix.cpp -o BigMatrix.o g++45 -I/usr/local/lib/R/include -I../inst/include -fpic -O2 -fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c S\ haredCounter.cpp -o SharedCounter.o g++45 -I/usr/local/lib/R/include -I../inst/include -fpic -O2 -fno-strict-aliasing -pipe -Wl,-rpath=/usr/local/lib/gcc45 -c b\ igmemory.cpp -o bigmemory.o bigmemory.cpp: In function 'bool TooManyRIndices(index_type)': bigmemory.cpp:40:27: error: 'powl' was not declared in this scope *** Error code 1 Stop in /tmp/Rtmpxwe3p4/R.INSTALL4f539336/bigmemory/src. ERROR: compilation failed for package 'bigmemory' * removing '/usr/local/lib/R/library/bigmemory' The downloaded packages are in '/tmp/RtmpMZCOVp/downloaded_packages' Updating HTML index of packages in '.Library' Making packages.html ... done Warning message: In install.packages("bigmemory") : installation of package 'bigmemory' had non-zero exit status - end error messages ----- It's a 64-bit FreeBSD 7.2 system running R version 2-13.0. Thanks, Premal -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit for Windows 64bit build of R
Alan, More RAM will definitely help. But if you have an object needing more than 2^31-1 ~ 2 billion elements, you'll hit a wall regardless. This could be particularly limiting for matrices. It is less limiting for data.frame objects (where each column could be 2 billion elements). But many R analytics under the hood use matrices, so you may not know up front where you could hit a limit. Jay Original message I have a Windows Server 2008 R2 Enterprise machine, with 64bit R installed running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3 1066 Mhz RAM. I am seeking to analyse very large data sets (perhaps as much as 10GB), without the addtional coding overhead of a package such as bigmemory(). My question is this - if we were to increase the RAM on the machine to (say) 128GB, would this become a possibility? I have read the documentation on memory limits and it seems so, but would like some additional confirmation before investing in any extra RAM. - -- John W. Emerson (Jay) Associate Professor of Statistics, Adjunct, and Acting Director of Graduate Studies Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bigmemory on Solaris
At one point we might have gotten something working (older version?) on Solaris x86, but were never successful on Solaris sparc that I remember -- it isn't a platform we can test and support. We believe there are problems with BOOST library compatibilities. We'll try (again) to clear up the other warnings in the logs, though. !-) We should also revisit the possibility of a CRAN BOOST library for use by a small group of packages (like bigmemory) which might make patches to BOOST easier to track and maintain. This might improve things in the long run. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficient coding with foreach and bigmemory
First, we strongly recommend 64-bit R. Otherwise, you may not be able to scale up as far as you would like. Second, as I think you realize, with big objects you may have to do things in chunks. I generally recommend working a column at a time rather than in blocks of rows if possible (better performance, particularly if the filebacking is used because of matrices exceeding RAM), and you may find that alternative data organization can really pay off. Keep an open mind. Third, you really need to avoid this runif(1,...) usage. It can't possibly be efficient. If a single call to runif() doesn't work, break it into chunks, certainly, but going down to chunks of size 1 just can't make any sense. Fourth, although you aren't there yet, once you get to the point you are trying to do things in parallel with foreach and bigmemory, you *may* need to place the following inside your foreach loop to make use of the shared memory properly: mdesc <- describe(m) foreach(...) %dopar% { require(bigmemory) m <- attach.big.matrix(mdesc) now operate on m } I say *may* because the backend doMC (not available in Windows) does not require this, but the other backends do; otherwise, the workers will not be able to properly address the shared-memory or filebacked big.matrix. Some documentation on bigmemory.org may help, and feel free to email us directly. Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Foreach (doMC)
> P.S. Is there any particular reason why there are so seldom answers to posts > regarding foreach and all these doMC/doSMP packages ? Do so few people use > these packages or does this have anything to do with the commercial origin of > these packages? Jannis, An interesting question. I'm a huge fan of foreach and the parallel backends, and have used foreach in some of my packages. It leaves the choice of backend to the user, rather than forcing some environment. If you like multicore, great -- the package doesn't care. Someone else may use doSNOW. No problem. To answer your question, foreach was originally written by (primarily, at least) Steve Weston, previously of REvolution Computing. It, along with some of the parallel backends (perhaps all at this point, I'm out of touch) are available open-source. Hence, I'd argue that the "commercial origin" is a moot point -- it doesn't matter, it will always be available, and it's really useful. Steve is no longer with REvolution, however, and I can't speak for the responsiveness/interest of current REvolution folks on this point. Scanning R-help daily for things relating to my own packages is something I try to do, but it doesn't always happen. I would like to think foreach is widely used -- it does have a growing list of reverse depends/suggests. And was updated as recently as last May, I just noticed. http://cran.r-project.org/web/packages/foreach/index.html Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Foreach (doMC)
Jannis, I'm not complete sure I understand your first point, but maybe someone from REvolution will weigh in. Nobody is forcing anyone to purchase any products, and there are attractive alternatives such as the CRAN R and R Studio (to name two). This issue has arisen many times of the various lists and you are welcome to search the archives and read many very intelligent, thoughtful opinions. As for foreach, etc... if you have fairly focused questions (preferably with a reproducible example if there is a problem) and if you have done reading on examples available on using it, then you might try joining the r-sig-...@r-project.org group. Clearly there are far more users of "core" R and hence "mainstream" questions on r-help are likely to be answered more quickly (on average) than specialized questions. Regards, Jay On Thu, Oct 20, 2011 at 4:27 PM, Jannis wrote: > Dear list members, dear Jay, > > Well, I personally do not care about Revolutions Analytics selling their > products as this is also included into the idea of many open source > licences. Especially as Revolutions provide their packages to the community > and its is everybodies personal choice to buy their special R version. > > I was just wondering about this issue as usually most questions on r-help > are answered pretty soon and by many different people and I had the > impression that this is not the case for posts regarding the > foreach/doMC/doSMP etc packages. This may, however, be also due to the > probably limited use of these packages for most users who do not need these > high performance computing things. Or it was just my personal perception or > pure chance. > > Thanks however, to the authors of such packages! They were of great help to > me on several ocasions and I have deep respect for everybody devoting his > time to open source software! > > Jannis > > > > On 10/19/2011 01:26 PM, Jay Emerson wrote: >>> >>> P.S. Is there any particular reason why there are so seldom answers to >>> posts regarding foreach and all these doMC/doSMP packages ? Do so few >>> people use these packages or does this have anything to do with the >>> commercial origin of these packages? >> >> Jannis, >> >> An interesting question. I'm a huge fan of foreach and the parallel >> backends, and have used foreach in some of my packages. It leaves the >> choice of backend to the user, rather than forcing some environment. >> If you like multicore, great -- the package doesn't care. Someone >> else may use doSNOW. No problem. >> >> To answer your question, foreach was originally written by (primarily, >> at least) Steve Weston, previously of REvolution Computing. It, along >> with some of the parallel backends (perhaps all at this point, I'm out >> of touch) are available open-source. Hence, I'd argue that the >> "commercial origin" is a moot point -- it doesn't matter, it will >> always be available, and it's really useful. Steve is no longer with >> REvolution, however, and I can't speak for the responsiveness/interest >> of current REvolution folks on this point. Scanning R-help daily for >> things relating to my own packages is something I try to do, but it >> doesn't always happen. >> >> I would like to think foreach is widely used -- it does have a growing >> list of reverse depends/suggests. And was updated as recently as last >> May, I just noticed. >> http://cran.r-project.org/web/packages/foreach/index.html >> >> Jay >> > > -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory
To answer your first question about read.big.matrix(), we don't know what your acc3.dat file is, but it doesn't appear to have been detected as a standard file (like a CSV file) or -- perhaps -- doesn't even exist (or doesn't exist in your current directory)? Next: > In addition, I am planning to do a multiple imputation with MICE package > using the data read by bigmemory package. > So usually, the multiple imputation code is like this: > imp=mice(data.frame,m=50,seed=1234,print=F) > the data.frame is required. How can I change the big.matrix class > generated by bigmemory package to a data.frame? Please read the help files for bigmemory -- only matrix-like objects are supported. However, the more serious problem is that you can't expect to run just any R function on a big.matrix (or on an ff object, if you check out ff for some nice features). In particular, for large data sets you would likely use up all of RAM (other reasons are more subtle and important, but out of place in this reply). Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bigmemory
R internally uses 32-bit integers for indexing (though this may change). For this and other reasons these external objects with specialized purposes (larger-than-RAM, shared memory) simply can't behave exactly as R objects. Best case, some R functions will work. Others would simply break. Others would perhaps work if the problem is small enough, but would choke in the creation of temporary objects in memory. I understand your sentiment, but it isn't that easy. If you are interested, however, we do provide examples of authoring functions in C++ which can work interchangeably on both matrix and big.matrix objects. Jay Hi Jay, > > I have a question about your reply. > > You mentioned that "the more serious problem is that you can't expect to > run just any R function on a big.matrix (or on an ff object, if you check > out ff for some nice features). " > > I am confused why the packages could not communicate with each other. I > understand that maybe for some programming or statistical reasons, one > package need its own "class" so that specific algorithm can be implemented. > However, R as a statistical programming environment, one of its advantages > is the abundance of the packages under R structure. If different packages > generate different kinds of object and can not be recognized and used for > further analysis by other packages, then each package would appears to be > similar with the normal independent software, e.g., SAS, MATLAB... then > this could reduce the whole R ability for handling complicated analysis > situation. > > This is just a general thought. > > Thank you very much. > > -- > ya > > -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Package bigmemory now available on CRAN
Package "bigmemory" is now available on CRAN. A brief abstract follows: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped hardware. C/C++ and Fortran programming can be helpful, but is cumbersome for interactive data analysis and lacks the flexibility and power of R's rich statistical programming environment. The new package bigmemory bridges this gap, implementing massive matrices in memory (managed in R but implemented in C++) and supporting their basic manipulation and exploration. It is ideal for problems involving the analysis in R of manageable subsets of the data, or when an analysis is conducted mostly in C++. In a Unix environment, the data structure may be allocated to shared memory with transparent read and write locking, allowing separate processes on the same computer to share access to a single copy of the data set. This opens the door for more powerful parallel analyses and data mining of massive data sets. -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ___ R-packages mailing list [EMAIL PROTECTED] https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package building
I agree with others that the packaging system is generally easy to use, and between the "Writing R Extensions" documentation and other scattered sources (including these lists) there shouldn't be many obstacles. Using "package.skeleton()" is a great way to get started: I'd recommend just having one data object and one new function in the session for starters. You can build up from there. I've only ran into time-consuming walls on more advanced, obscure issues. For example: the "Suggests:" field in DESCRIPTION generated quite some debate back in 2005, but until I found that thread in the email lists I didn't understand the issue. For completeness, I'll round out this discussion, hoping I'm correct. In essence, I think the choice of the word "Suggests:" was intended for the package user, not for the developer. The user isn't required to have a suggested package in order to load and use the desired package. But the developer is required (in the R CMD check) to have the suggested package in order to avoid warnings or fails. This does, actually, make sense, because we assume a developer would want/need to check features that involve the suggested package. In a few isolated cases (I think I had one of them), this caused a problem, where a desired suggested package isn't distributed by CRAN on all platforms, so I would risk getting into trouble with R CMD check on the platform without the suggested package. But this is pretty obscure, and the issue was obviously well-debated in the past. The addition of a line or two about this in "Writing R Extensions" would be friendly (the current content is correct and minimal sufficient I believe). Maybe I should draft this and submit it to the group. Secondly, I would advice a newbie to the packaging system to avoid S4 at first. Ultimately, I think it's pretty cool. But, for example, documentation on proper documentation (to handle the man pages correctly) has puzzled me, and even though I can create a package with S4 that passes R CMD check cleanly, I'm not convinced I've got it quite right. If someone has recently created more documentation or a 'white pages' on this, please do spread the word. Thanks to all who have -- and continue -- to work on the system! Jay >Subject: [R] R package building > >In a few days I'll give a talk on R package development and my >personal experience, at the 3rd Free / Libre / Open Source Software >(FLOSS) Conference which will take place on May 27th & 28th 2008, in >the National Technical University of Athens, in Greece. > >I would appreciate if you could share >your thoughts with me; What are today's obstacles on R package >building, according to your >opinion and personal experience. > >Thanks, >-- >Angelos I. Markos >Scientific Associate, Dpt of Exact Sciences, TEI of Thessaloniki, GR >"I'm not an outlier; I just haven't found my distribution yet" -- John W. Emerson (Jay) Assistant Professor of Statistics Director of Graduate Studies Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using very large matrix
Corrado, Package bigmemory has undergone a major re-engineering and will be available soon (available now in Beta version upon request). The version currently on CRAN is probably of limited use unless you're in Linux. bigmemory may be useful to you for data management, at the very least, where x <- filebacked.big.matrix(8, 8, init=n, type="double") would accomplish what you want using filebacking (disk space) to hold the object. But even this requires 64-bit R (Linux or Mac, or perhaps a Beta version of Windows 64-bit R that REvolution Computing is working on). Subsequent operations (e.g. extraction of a small portion for analysis) are then easy enough: y <- x[1,] would give you the first row of x as an object y in R. Note that x is not itself an R matrix, and most existing R analytics can't work on x directly (and would max out the RAM if they tried, anyway). Feel free to email me for more information (and this invitation applies to anyone who is interested in this). Cheers, Jay #Dear friends, # #I have to use a very large matrix. Something of the sort of #matrix(8,8,n) where n is something numeric of the sort 0.xx # #I have not found a way of doing it. I keep getting the error # #Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements specified # #Any suggestions? I have searched the mailing list, but to no avail. # #Best, #-- #Corrado Topi # #Global Climate Change & Biodiversity Indicators #Area 18,Department of Biology #University of York, York, YO10 5YW, UK #Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using very large matrix
Steve et.al., The old version is still on CRAN, but I strongly encourage anyone interested to email me directly and I'll make the new version available. In fact, I wouldn't mind just pulling the old version off of CRAN, but of course that's not a great idea. !-) Jay On Mon, Mar 2, 2009 at 8:47 AM, wrote: > I'm very interested in the bigmemory package for windows 32-bit > environments. Who do I need to contact to request the Beta version? > > Thanks > Steve > > Steve Friedman Ph. D. > Spatial Statistical Analyst > Everglades and Dry Tortugas National Park > 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > > steve_fried...@nps.gov > Office (305) 224 - 4282 > Fax (305) 224 - 4147 > > > > Corrado > > To > Sent by: john.emer...@yale.edu, Tony Breyal > r-help-boun...@r- > project.org cc > r-help@r-project.org > Subject > 03/02/2009 10:46 Re: [R] Using very large matrix > AM GMT > > > > > > > > > > Thanks a lot! > > Unfortunately, the R package I have to sue for my research was only > released > on 32 bit R on 32 bit MS Windows and only closed source I normally > use > 64 bit R on 64 bit Linux :) > > I tried to use the bigmemory in cran with 32 bit windows, but I had some > serious problems. > > Best, > > On Thursday 26 February 2009 15:43:11 Jay Emerson wrote: >> Corrado, >> >> Package bigmemory has undergone a major re-engineering and will be >> available soon (available now in Beta version upon request). The version >> currently on CRAN >> is probably of limited use unless you're in Linux. >> >> bigmemory may be useful to you for data management, at the very least, >> where >> >> x <- filebacked.big.matrix(8, 8, init=n, type="double") >> >> would accomplish what you want using filebacking (disk space) to hold >> the object. >> But even this requires 64-bit R (Linux or Mac, or perhaps a Beta >> version of Windows 64-bit >> R that REvolution Computing is working on). >> >> Subsequent operations (e.g. extraction of a small portion for analysis) > are >> then easy enough: >> >> y <- x[1,] >> >> would give you the first row of x as an object y in R. Note that x is >> not itself an R matrix, >> and most existing R analytics can't work on x directly (and would max >> out the RAM if they >> tried, anyway). >> >> Feel free to email me for more information (and this invitation >> applies to anyone who is >> interested in this). >> >> Cheers, >> >> Jay >> >> #Dear friends, >> # >> #I have to use a very large matrix. Something of the sort of >> #matrix(8,8,n) where n is something numeric of the sort >> 0.xx # >> #I have not found a way of doing it. I keep getting the error >> # >> #Error in matrix(nrow = 8, ncol = 8, 0.2) : too many elements >> specified # >> #Any suggestions? I have searched the mailing list, but to no avail. >> # >> #Best, >> #-- >> #Corrado Topi >> # >> #Global Climate Change & Biodiversity Indicators >> #Area 18,Department of Biology >> #University of York, York, YO10 5YW, UK >> #Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk > > > -- > Corrado Topi > > Global Climate Change & Biodiversity Indicators > Area 18,Department of Biology > University of York, York, YO10 5YW, UK > Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.