Re: [R] matrix manipulation question
Many thanks, Stéphane Le 30 mars 2015 à 10:42, peter dalgaard a écrit : > >> On 30 Mar 2015, at 09:59 , Stéphane Adamowicz >> wrote: >> >> >> However, in order to help me understand, would you be so kind as to give me >> a matrix or data.frame example where « complete.cases(X)== T » or « >> complete.cases(X)== TRUE » would give some unwanted result ? > > The standard problem with T for TRUE is if T has been used for some other > purpose, like a time variable. E.g., T <- 0 ; complete.cases(X)==T. > > complete.cases()==TRUE is just silly, like (x==0)==TRUE or > ((x==0)==TRUE)==TRUE). > > (However, notice that x==TRUE is different from as.logical(x) if x is > numeric, so ifelse(x,y,z) may differ from ifelse(x==TRUE,y,z).) > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Plots using ggplot
Hi All, Sorry for the shape of data which was not good enough.This is how my data look like. I want to plot multiple using ggplot function from a data frame of many columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. and I failed to make it. What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting vertical line. I also need to add points to the plot to be able to separate them. The x-axis must be date column. Thanks! Here is how the data look like and how I tried to make it. Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 Here is how I tried to solve the problem. df1 <-data.frame(data) df1 df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') df2 ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") Kindly any help is welcome. Thanks Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller wrote: > This is no better because (a) you are still posting using HTML format, and > (b) using printed output loses the internal representation of the data. The > dput function is very helpful for solving this. [1] > > [1] > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya > wrote: > >Hi Stephen, > > > >Sorry, the data came in bad way. > >Here is the head of the data. > > > >> head(data)Date Number.of.Rain.Days Total.rain > >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. > >Start.Rain..iv. > >1 1952-01-01 86 1139.95292 > > 239 112 112 > >2 1953-01-01 96977.64698 > > 98 112 112 > >3 1954-01-01 114 1382.01492 > > 92 120 120 > >4 1955-01-01 119 1323.086 100 > > 100 125 174 > >5 1956-01-01 123 1266.44492 > > 92 119 119 > >6 1957-01-01 124 1235.96492 > > 92 112 112 > > > > > > > >Frederic Ntirenganya > >Maseno University, > >African Maths Initiative, > >Kenya. > >Mobile:(+254)718492836 > >Email: fr...@aims.ac.za > >https://sites.google.com/a/aims.ac.za/fredo/ > > > >On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick > >wrote: > > > >> Hi Frederic, > >> > >> Can you provide a minimal reproducible example including either real > >data > >> (dput), or simulated data that mimics your situation? This will allow > >more > >> people to help. > >> > >> Stephen > >> > >> On Mon, Mar 30, 2015 at 8:39 AM, Frederic Ntirenganya > > > >> wrote: > >> > >>> Dear All, > >>> > >>> I want to plot multiple using ggplot function from a data frame of > >>> many columns. I want to plot only str1, str2 and str3 and I failed > >to > >>> make it. What I want is to compare str1, str2 and str3 by plotting > >>> vertical line. I also need to add points to the plot to be able to > >>> separate them. > >>> > >>> > >>> Here is how the data look like and how I tried to make it. > >>> > >>> Date NumberofRaindays TotalRains str1 str2 str3 1/1/1952 86 1360.5 > >92 120 > >>> 112 1/1/1953 96 1100 98 100 110 > >>> ... > >>> ... > >>> > >>> df1 <-data.frame(data) > >>> df1 > >>> df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') > >>> df2 > >>> > >>> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = > >"h") > >>> > >>> Kindly any help is welcome. Thanks > >>> > >>> Regards, > >>> Frederic. > >>> > >>> Frederic Ntirenganya > >>> Maseno University, > >>> African Maths Initiative, > >>> Kenya. > >>> Mobile:(+254)718492836 > >>> Email: fr...@aims.ac.za > >>> https://sites.google.com/a/aims.ac.za/fredo/ > >>> > >>> [[alternative HTML
[R] MethComp exported object namespace error
Hi everyone, I am using the MCmcmc function of the MethComp package and receive the following error: Error: 'coda.samples' is not an exported object from 'namespace:coda' I emailed the package author last week but haven't had a reply. I have installed JAGS 3.4.0 as required by MethComp. I am using R 3.1.2 and the MethComp currently on CRAN (1.22.1). I am not a regular R user so haven't had any luck making sense of the error, though there are references to namespace in the package check results here: http://cran.itam.mx/web/checks/check_results_bxc_at_steno.dk.html#MethComp. Not sure if that's relevant. Any suggestions would be appreciated. Apologies if I haven't provided any required information. The following shows my code and error (code taken from the package author's text 'Comparing Clinical Measurement Methods', Bendix Carstensen, section 7.5.3) : >library(MethComp) >data(ox) >ox<- Meth(ox) >m3<- MCmcmc(ox, IxR=TRUE, n.iter=5) Comparison of 2 methods, using 354 measurements on 61 items, with up to 3 replicate measurements, (replicate values are in the set: 1 2 3 ) ( 2 * 61 * 3 = 366 ): No. items with measurements on each method: #Replicates Method1 2 3 #Items #Obs: 354 Values: min med max CO 1 4 56 61 177 22.2 78.6 93.5 pulse 1 4 56 61 177 24.0 75.0 94.0 Simulation run of a model with - method by item and item by replicate interaction: - using 4 chains run for 5 iterations (of which 25000 are burn-in), - monitoring every 25 values of the chain: - giving a posterior sample of 4000 observations. Loading required package: coda Linked to JAGS 3.4.0 Loaded modules: basemod,bugs Initialization and burn-in: Compiling model graph Resolving undeclared variables Allocating nodes Graph Size: 2868 Initializing model |++| 100% Sampling: Error: 'coda.samples' is not an exported object from 'namespace:coda' Thanks, Kylie. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in lm() with very small (close to zero) regressor
I found a fix to my problem using the fastLm() from package RcppEigen, using the Jacobi singular value decomposition (SVD) (method 4) or a method based on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm function install.packages("RcppEigen") library(RcppEigen) n_obs <- 1500 y <- rnorm(n_obs, 10,2.89) x1 <- rnorm(n_obs, 0.01235657,0.45) x2 <- rnorm(n_obs, 10,3.21) X <- cbind(x1,x2) bFE <- fastLm(y ~ x1 + x2, method =4) bFE Call: fastLm.formula(formula = y ~ x1 + x2, method = 4) Coefficients: (Intercept) x1 x2 9.94832839474159414 0.12293 0.00440078989949841 Best, Raluca -- View this message in context: http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Plots using ggplot
By failing to take the advice given to you, you make it harder to help you. Learn to control your email program to send plain text, and learn to use the dput function. With regard to this function call: > ggplot(df2, aes(Date,value)) + I highly recommend using named parameters in the aes call. Also, if you want different values of "variable" to be plotted with different colors, you should map that column to the colour dimension: ggplot(df2, aes(x=Date,y=value,colour=variable)) + The "type" argument applies to base graphics rather than ggplot graphics, and you should never put fixed values inside the aes call. Since colour has already been taken care of, you can give no parameters in the geom_line call: geom_line() So all together then: ggplot(df2, aes(x=Date,y=value,colour=variable)) + geom_line() but I cannot test it because you have not followed my other advice. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 31, 2015 12:55:11 AM PDT, Frederic Ntirenganya wrote: > Hi All, > >Sorry for the shape of data which was not good enough.This is how my >data look like. > >I want to plot multiple using ggplot function from a data frame of >many columns. I want to plot only Start.of.Rain..i., >Start.of.Rain..ii. and Start.of.Rain..iii. and I failed to make it. >What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and >Start.of.Rain..iii. by plotting vertical line. I also need to add >points to the plot to be able to separate them. The x-axis must be >date column. Thanks! > >Here is how the data look like and how I tried to make it. > > > >Date Number.of.Rain.Days Total.rain Start.of.Rain..i. >Start.of.Rain..ii. >Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 >977.646 >98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 >100 >12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 > > >Here is how I tried to solve the problem. > >df1 <-data.frame(data) >df1 >df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') >df2 > >ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") > >Kindly any help is welcome. Thanks > >Regards, >Frederic. > >Frederic Ntirenganya >Maseno University, >African Maths Initiative, >Kenya. >Mobile:(+254)718492836 >Email: fr...@aims.ac.za >https://sites.google.com/a/aims.ac.za/fredo/ > >On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller > >wrote: > >> This is no better because (a) you are still posting using HTML >format, and >> (b) using printed output loses the internal representation of the >data. The >> dput function is very helpful for solving this. [1] >> >> [1] >> >http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> >--- >> Jeff NewmillerThe . . Go >Live... >> DCN:Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--- >> Sent from my phone. Please excuse my brevity. >> >> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya > >> wrote: >> >Hi Stephen, >> > >> >Sorry, the data came in bad way. >> >Here is the head of the data. >> > >> >> head(data)Date Number.of.Rain.Days Total.rain >> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. >> >Start.Rain..iv. >> >1 1952-01-01 86 1139.95292 >> > 239 112 112 >> >2 1953-01-01 96977.64698 >> > 98 112 112 >> >3 1954-01-01 114 1382.01492 >> > 92 120 120 >> >4 1955-01-01 119 1323.086 100 >> > 100 125 174 >> >5 1956-01-01 123 1266.44492 >> > 92 119 119 >> >6 1957-01-01 124 1235.96492 >> > 92 112 112 >> > >> > >> > >> >Frederic Ntirenganya >> >Maseno University, >> >African Maths Initiative, >> >Kenya. >> >Mobile:(+254)718492836 >> >Email: fr...@aims.ac.za >> >https://sites.google.com/a/ai
Re: [R] Multiple Plots using ggplot
Your data and post is still not provided in one of the formats provided here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. I am unsure of what you want to do, but I have made a reproducible example that might help. zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. 1952-01-01 86 1139.95292 239 11 1953-01-01 96977.64698 98 11 1954-01-01 114 1382.01492 92 12 1955-01-01 119 1323.086 100 100 12 1956-01-01 123 1266.44492 92 11 1957-01-01 124 1235.96492 92 11" library(reshape) library(ggplot2) Data <- read.table(text=zz, header = TRUE) df1 <-data.frame(Data) df2 <- melt(df1 , id = c('Date', 'Number.of.Rain.Days')) df3 <- df2[-grep("Total.rain", df2$variable),] qplot(Date,value, data=df3) +facet_wrap(~variable) On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya wrote: > Hi All, > > Sorry for the shape of data which was not good enough.This is how my data > look like. > > I want to plot multiple using ggplot function from a data frame of many > columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and > Start.of.Rain..iii. and I failed to make it. What I want is to compare > Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting > vertical line. I also need to add points to the plot to be able to separate > them. The x-axis must be date column. Thanks! > > Here is how the data look like and how I tried to make it. > > > > Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. > Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646 > 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 > 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 > > > Here is how I tried to solve the problem. > > df1 <-data.frame(data) > df1 > df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') > df2 > > ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") > > Kindly any help is welcome. Thanks > > Regards, > Frederic. > > Frederic Ntirenganya > Maseno University, > African Maths Initiative, > Kenya. > Mobile:(+254)718492836 > Email: fr...@aims.ac.za > https://sites.google.com/a/aims.ac.za/fredo/ > > On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller > wrote: > >> This is no better because (a) you are still posting using HTML format, >> and (b) using printed output loses the internal representation of the data. >> The dput function is very helpful for solving this. [1] >> >> [1] >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> >> --- >> Jeff NewmillerThe . . Go >> Live... >> DCN:Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> >> --- >> Sent from my phone. Please excuse my brevity. >> >> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya < >> ntfr...@gmail.com> wrote: >> >Hi Stephen, >> > >> >Sorry, the data came in bad way. >> >Here is the head of the data. >> > >> >> head(data)Date Number.of.Rain.Days Total.rain >> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. >> >Start.Rain..iv. >> >1 1952-01-01 86 1139.95292 >> > 239 112 112 >> >2 1953-01-01 96977.64698 >> > 98 112 112 >> >3 1954-01-01 114 1382.01492 >> > 92 120 120 >> >4 1955-01-01 119 1323.086 100 >> > 100 125 174 >> >5 1956-01-01 123 1266.44492 >> > 92 119 119 >> >6 1957-01-01 124 1235.96492 >> > 92 112 112 >> > >> > >> > >> >Frederic Ntirenganya >> >Maseno University, >> >African Maths Initiative, >> >Kenya. >> >Mobile:(+254)718492836 >> >Email: fr...@aims.ac.za >> >https://sites.google.com/a/aims.ac.za/fredo/ >> > >> >On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick >> >wrote: >> > >> >> Hi Frederic, >> >> >> >> Can you provide a minimal reproducible example including either real >> >data >> >> (dput), or simulated data that mimics your si
Re: [R] Multiple Plots using ggplot
The data you supplied is still in a useless format. Please send it to us in dput format (and don't post in html) Here is a complete example of creating a data.frame and converting it to a useable data set that readers on R-help can use ##=Start Example===## # Simple example data set in a data.frame data1 <- data.frame(xx = 1:20, yy = sample(letters[1:26], 20, replace = TRUE), zz <- rnorm(20)) dput(data1) # convert to dput() format for tranfering to other userss # dput() result. Copy and paste back into your editor structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L ), .Label = c("a", "f", "g", "h", "i", "j", "k", "o", "p", "u", "w", "z"), class = "factor"), zzrnorm.20. = c(0.379202224643519, -0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, -0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c("xx", "yy", "zzrnorm.20."), row.names = c(NA, -20L), class = "data.frame") # Read data back into standard R format, calling the data "dat1" dat1 <- structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L ), .Label = c("a", "f", "g", "h", "i", "j", "k", "o", "p", "u", "w", "z"), class = "factor"), zzrnorm.20. = c(0.379202224643519, -0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, -0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c("xx", "yy", "zzrnorm.20."), row.names = c(NA, -20L), class = "data.frame") dat1 ##=End Example===## John Kane Kingston ON Canada > -Original Message- > From: ntfr...@gmail.com > Sent: Tue, 31 Mar 2015 10:55:11 +0300 > To: jdnew...@dcn.davis.ca.us > Subject: Re: [R] Multiple Plots using ggplot > > Hi All, > > Sorry for the shape of data which was not good enough.This is how my > data look like. > > I want to plot multiple using ggplot function from a data frame of > many columns. I want to plot only Start.of.Rain..i., > Start.of.Rain..ii. and Start.of.Rain..iii. and I failed to make it. > What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and > Start.of.Rain..iii. by plotting vertical line. I also need to add > points to the plot to be able to separate them. The x-axis must be > date column. Thanks! > > Here is how the data look like and how I tried to make it. > > > > Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. > Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 > 977.646 > 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 > 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 > > > Here is how I tried to solve the problem. > > df1 <-data.frame(data) > df1 > df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') > df2 > > ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") > > Kindly any help is welcome. Thanks > > Regards, > Frederic. > > Frederic Ntirenganya > Maseno University, > African Maths Initiative, > Kenya. > Mobile:(+254)718492836 > Email: fr...@aims.ac.za > https://sites.google.com/a/aims.ac.za/fredo/ > > On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller > > wrote: > >> This is no better because (a) you are still posting using HTML format, >> and >> (b) using printed output loses the internal representation of the data. >> The >> dput function is very helpful for solving this. [1] >> >> [1] >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> --- >> Jeff NewmillerThe . . Go >> Live... >> DCN:Basics: ##.#. ##.#. Live >> Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/BatteriesO.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> --- >> Sent from my phone. Please excuse my brevity. >> >> On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya >> >> wrote: >> >Hi Stephen, >>> >> >Sorry, the data came in bad way. >> >Here is the head of the data. >>> head(data)Date Number.of.Rain.Days Total.rain >> >Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii. >> >Start.Rain..iv.
Re: [R] Multiple Plots using ggplot
Hi All, Thanks for the help. I want to plot some of the columns on the same graph not all of them. Sorry, I failed to follow the instructions. Here is the output of *dput()* but I don't know how it works. > dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, -5479, -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L, 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646, 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L, 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L, 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L, 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L, 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L, 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain", "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.", "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L ), class = "data.frame") I think I need subset function then melt. Here is the approach I used: d <- subset(df1, select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) d d2 <- melt(d , id = 'Date', variable_name = 'Start') ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") but the error is: Don't know how to automatically pick scale for object of type function. Defaulting to continuousError in data.frame(colour = function (x, ...) : arguments imply differing number of rows: 0, 183 Thanks, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick wrote: > Your data and post is still not provided in one of the formats provided > here: > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. > I am unsure of what you want to do, but I have made a reproducible example > that might help. > > zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i. > Start.of.Rain..ii. Start.of.Rain..iii. > 1952-01-01 86 1139.95292 > 239 11 > 1953-01-01 96977.64698 > 98 11 > 1954-01-01 114 1382.01492 > 92 12 > 1955-01-01 119 1323.086 100 > 100 12 > 1956-01-01 123 1266.44492 > 92 11 > 1957-01-01 124 1235.96492 > 92 11" > > library(reshape) > library(ggplot2) > > Data <- read.table(text=zz, header = TRUE) > > df1 <-data.frame(Data) > > df2 <- melt(df1 , id = c('Date', 'Number.of.Rain.Days')) > > df3 <- df2[-grep("Total.rain", df2$variable),] > > qplot(Date,value, data=df3) +facet_wrap(~variable) > > On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya > wrote: > >> Hi All, >> >> Sorry for the shape of data which was not good enough.This is how my data >> look like. >> >> I want to plot multiple using ggplot function from a data frame of many >> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and >> Start.of.Rain..iii. and I failed to make it. What I want is to compare >> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting >> vertical line. I also need to add points to the plot to be able to separate >> them. The x-axis must be date column. Thanks! >> >> Here is how the data look like and how I tried to make it. >> >> >> >> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. >> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646 >> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 >> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 >> >> >> Here is how I tried to solve the problem. >> >> df1 <-data.frame(data) >> df1 >> df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') >> df2 >> >> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") >> >> Kindly any help is welcome. Thanks >> >> Regards, >> Frederic. >> >> Frederic Ntirenganya >> Maseno University, >> African Maths Initiative, >> Kenya. >> Mobile:(+254)718492836 >> Email: fr...@aims.ac.za >> https://sites.google.com/a/aims.ac.za/fredo/ >> >> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller > > wrote: >> >>> This is no better because (a) you are still posting using HTML format, >>> and (b) using printed output loses the internal representation of the data. >>> The dput function is very helpful for solving this. [1] >>> >>> [1] >>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >>> >>> --- >>> Jeff NewmillerThe . . Go >>> Live... >>> DCN:Basics: ##.#. ##.#. Live >>> Go
Re: [R] Plotting using tapply function output
Reproducibility https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada > -Original Message- > From: amc5...@gmail.com > Sent: Mon, 30 Mar 2015 16:07:05 -0700 > To: r-help@r-project.org > Subject: [R] Plotting using tapply function output > > Hello, > > I am trying to plot the hourly standard deviation of wind speeds from > 13 different measured locations over many years. I imported the data > using readLines and into a dataframe called finalData. Using tapply, I > determined the standard deviation of the windspeed (ws) for each hour > (hour) from every location (stn) using this command line: > > statHour = tapply(finalData$ws,list(finalData$stn,finalData$hour),sd) > > I want to plot the standard deviation for each hour of the day, with > hours as the x-axis and the standard deviation for the y-axis, and > each station as a different color. I've managed to get a boxplot of > this, but ideally, I'd like a scatter plot to determine the variations > between each instrument throughout the day. The boxplot command is > this: > > boxplot(statHour, names=colnames(statHour),xlab='Hour of the > Day',ylab='Standard Deviation of Wind Speed') > > I also tried to make a dataframe of the tapply output but it ends up > using the hours as the column names instead of putting it into the > dataframe. Please help!! > > I have R version 3.1.1 > > Thanks a lot, > Alexandra > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple Plots using ggplot
Hi John, Sorry for the mistake I made for providing useless data. Here I am interest only on Tmin and Tmax columns. I want to use the same approach with the previous data. I want to plot on the same graph not separate graph. Thanks > dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L, 1971L, > 1971L ), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0, 0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin = c(13.5, 13.2, 12.7, 12.7, 12.2, 14)), .Names = c("Year", "Month", "Day", "Rain", "Tmax", "Tmin"), row.names = c(NA, 6L), class = "data.frame") Regards, Frederic. Frederic Ntirenganya Maseno University, African Maths Initiative, Kenya. Mobile:(+254)718492836 Email: fr...@aims.ac.za https://sites.google.com/a/aims.ac.za/fredo/ On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya wrote: > Hi All, > > Thanks for the help. I want to plot some of the columns on the same graph > not all of them. Sorry, I failed to follow the instructions. Here is the > output of *dput()* but I don't know how it works. > > > dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, > > -5479, > -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L, > 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646, > 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L, > 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L, > 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L, > 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L, > 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L, > 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain", > "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.", > "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L > ), class = "data.frame") > > I think I need subset function then melt. Here is the approach I used: > > d <- subset(df1, > select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) > d > d2 <- melt(d , id = 'Date', variable_name = 'Start') > > ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") > > but the error is: > > Don't know how to automatically pick scale for object of type function. > Defaulting to continuousError in data.frame(colour = function (x, ...) : > arguments imply differing number of rows: 0, 183 > > > Thanks, > > Frederic. > > > > Frederic Ntirenganya > Maseno University, > African Maths Initiative, > Kenya. > Mobile:(+254)718492836 > Email: fr...@aims.ac.za > https://sites.google.com/a/aims.ac.za/fredo/ > > On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick wrote: > >> Your data and post is still not provided in one of the formats provided >> here: >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. >> I am unsure of what you want to do, but I have made a reproducible example >> that might help. >> >> zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i. >> Start.of.Rain..ii. Start.of.Rain..iii. >> 1952-01-01 86 1139.95292 >>239 11 >> 1953-01-01 96977.64698 >> 98 11 >> 1954-01-01 114 1382.01492 >> 92 12 >> 1955-01-01 119 1323.086 100 >>100 12 >> 1956-01-01 123 1266.44492 >> 92 11 >> 1957-01-01 124 1235.96492 >> 92 11" >> >> library(reshape) >> library(ggplot2) >> >> Data <- read.table(text=zz, header = TRUE) >> >> df1 <-data.frame(Data) >> >> df2 <- melt(df1 , id = c('Date', 'Number.of.Rain.Days')) >> >> df3 <- df2[-grep("Total.rain", df2$variable),] >> >> qplot(Date,value, data=df3) +facet_wrap(~variable) >> >> On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya >> wrote: >> >>> Hi All, >>> >>> Sorry for the shape of data which was not good enough.This is how my data >>> look like. >>> >>> I want to plot multiple using ggplot function from a data frame of many >>> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and >>> Start.of.Rain..iii. and I failed to make it. What I want is to compare >>> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting >>> vertical line. I also need to add points to the plot to be able to separate >>> them. The x-axis must be date column. Thanks! >>> >>> Here is how the data look like and how I tried to make it. >>> >>> >>> >>> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. >>> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646 >>> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 >>> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 >>> >>> >>> Here is how I tried to solve the problem. >>> >>> df1 <-data.frame(data) >>> df1 >>>
Re: [R] Multiple Plots using ggplot
The error message is very informative. You named a column in the melted data "Start", and told ggplot to use "start". "start" is a function. R is case sensitive. On Tue, Mar 31, 2015 at 8:46 AM, Frederic Ntirenganya wrote: > Hi All, > > Thanks for the help. I want to plot some of the columns on the same graph > not all of them. Sorry, I failed to follow the instructions. Here is the > output of *dput()* but I don't know how it works. > > > dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, > > -5479, > -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L, > 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646, > 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L, > 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L, > 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L, > 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L, > 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L, > 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain", > "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.", > "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L > ), class = "data.frame") > > I think I need subset function then melt. Here is the approach I used: > > d <- subset(df1, > select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) > d > d2 <- melt(d , id = 'Date', variable_name = 'Start') > > ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") > > but the error is: > > Don't know how to automatically pick scale for object of type function. > Defaulting to continuousError in data.frame(colour = function (x, ...) : > arguments imply differing number of rows: 0, 183 > > > Thanks, > > Frederic. > > > > Frederic Ntirenganya > Maseno University, > African Maths Initiative, > Kenya. > Mobile:(+254)718492836 > Email: fr...@aims.ac.za > https://sites.google.com/a/aims.ac.za/fredo/ > > On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick wrote: > >> Your data and post is still not provided in one of the formats provided >> here: >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. >> I am unsure of what you want to do, but I have made a reproducible example >> that might help. >> >> zz <- "Date Number.of.Rain.Days Total.rain Start.of.Rain..i. >> Start.of.Rain..ii. Start.of.Rain..iii. >> 1952-01-01 86 1139.95292 >>239 11 >> 1953-01-01 96977.64698 >> 98 11 >> 1954-01-01 114 1382.01492 >> 92 12 >> 1955-01-01 119 1323.086 100 >>100 12 >> 1956-01-01 123 1266.44492 >> 92 11 >> 1957-01-01 124 1235.96492 >> 92 11" >> >> library(reshape) >> library(ggplot2) >> >> Data <- read.table(text=zz, header = TRUE) >> >> df1 <-data.frame(Data) >> >> df2 <- melt(df1 , id = c('Date', 'Number.of.Rain.Days')) >> >> df3 <- df2[-grep("Total.rain", df2$variable),] >> >> qplot(Date,value, data=df3) +facet_wrap(~variable) >> >> On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya >> wrote: >> >>> Hi All, >>> >>> Sorry for the shape of data which was not good enough.This is how my data >>> look like. >>> >>> I want to plot multiple using ggplot function from a data frame of many >>> columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and >>> Start.of.Rain..iii. and I failed to make it. What I want is to compare >>> Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting >>> vertical line. I also need to add points to the plot to be able to separate >>> them. The x-axis must be date column. Thanks! >>> >>> Here is how the data look like and how I tried to make it. >>> >>> >>> >>> Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii. >>> Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646 >>> 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100 >>> 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11 >>> >>> >>> Here is how I tried to solve the problem. >>> >>> df1 <-data.frame(data) >>> df1 >>> df2 <- melt(df1 , id = 'Date', variable_name = 'start of Rains') >>> df2 >>> >>> ggplot(df2, aes(Date,value)) + geom_line(aes(colour ="red"),type = "h") >>> >>> Kindly any help is welcome. Thanks >>> >>> Regards, >>> Frederic. >>> >>> Frederic Ntirenganya >>> Maseno University, >>> African Maths Initiative, >>> Kenya. >>> Mobile:(+254)718492836 >>> Email: fr...@aims.ac.za >>> https://sites.google.com/a/aims.ac.za/fredo/ >>> >>> On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller < >>> jdnew...@dcn.davis.ca.us> wrote: >>> This is no better because (a) you are still posting using HTML format, and (b) using p
Re: [R] Multiple Plots using ggplot
Hi Frederic, Thanks for sending the data in dput() format. All it does in convert a data set into a standardized format (perfect copy) that anyone with R can read. People have different setups and defaults for reading data and so on and what you may read in to R as a character variable may be a factor when I read it it in and we can have some serious problems just trying to decide what the data looks like. I had a look at your code and it is confused. See my comments below d <- subset(df1, select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) d d2 <- melt(d , id = 'Date', variable_name = 'Start') # You do not have any variable in your data.frame called “Start” # Reshape2 seems to have just ignored “variable_name = 'Start' and did the melt based on id = 'Date'. Strange, I would have expected an error but it worked ! d2 <- melt(d , id = 'Date') will give you exactly the same result. ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") Again you do not have a variable (column name) called 'start'. You have three column names (variables) in d2 These are "Date" "variable" and "value" . ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = "h") Point one, you have no variable called start. Point two, what is type = “h” doing here? It is, as far as I can see not an option in geom_line for such an option. See ?geom_line for this point. I think you are confusing basic graphics commands ("type =") with ggplot commands. Have a look at http://www.cookbook-r.com/Graphs/Shapes_and_line_types/ for some examples that show the differences. Below is what I think you may be trying to do (note I use dat1 for the data.frame rather than your df1). ###== dat1 <- structure(list(Date = structure(c(-6575, -6209, -5844, -5479, -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L, 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646, 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L, 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L, 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L, 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L, 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L, 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain", "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.", "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L ), class = "data.frame") dd <- subset(dat1, select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) d2 <- melt(dd , id = 'Date') ggplot(d2, aes(Date,value)) + geom_line(aes(colour = variable)) ggplot(d2, aes(Date, value)) + geom_histogram( position="dodge", stat = "identity", aes(fill = variable)) ## John Kane Kingston ON Canada > -Original Message- > From: ntfr...@gmail.com > Sent: Tue, 31 Mar 2015 16:55:56 +0300 > To: ssef...@gmail.com > Subject: Re: [R] Multiple Plots using ggplot > > Hi John, > > Sorry for the mistake I made for providing useless data. > Here I am interest only on Tmin and Tmax columns. I want to use the same > approach with the previous data. I want to plot on the same graph not > separate graph. Thanks > >> dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L, >> 1971L, 1971L > ), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0, > 0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin = > c(13.5, > 13.2, 12.7, 12.7, 12.2, 14)), .Names = c("Year", "Month", "Day", > "Rain", "Tmax", "Tmin"), row.names = c(NA, 6L), class = "data.frame") > > Regards, > > Frederic. > > > > Frederic Ntirenganya > Maseno University, > African Maths Initiative, > Kenya. > Mobile:(+254)718492836 > Email: fr...@aims.ac.za > https://sites.google.com/a/aims.ac.za/fredo/ > > On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya > wrote: > >> Hi All, >> >> Thanks for the help. I want to plot some of the columns on the same >> graph >> not all of them. Sorry, I failed to follow the instructions. Here is the >> output of *dput()* but I don't know how it works. >> >>> dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, >>> -5479, >> -5114, -4748), class = "Date"), Number.of.Rain.Days = c(86L, >> 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646, >> 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L, >> 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L, >> 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L, >> 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L, >> 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L, >> 228L)), .Names = c("Date", "Number.of.Rain.Days", "Total.rain", >> "Start.of.Rain..i.", "Start.of.Rain..ii.", "Start.of.Rain..iii.", >> "Start.Rain..iv.", "End.of.Rain.Season"), row.names = c(NA, 6L >> ), class = "da
Re: [R] data.frame: data-driven column selections that vary by row??
I think we need some data and code Reproducibility https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example John Kane Kingston ON Canada > -Original Message- > From: r...@catwhisker.org > Sent: Mon, 30 Mar 2015 06:50:59 -0700 > To: r-help@r-project.org > Subject: [R] data.frame: data-driven column selections that vary by row?? > > Sorry if that's confusing: I'm probably confused. :-( > > I am collecting and trying to analyze data regarding performance of > computer systems. > > After extracting the data from its repository, I have created and > used a Perl script to generate a (relatively) simple CSV, each > record of which contains: > * a POSIXct timestamp > * a hostname > * a collection of metrics for the interval identified by the timestamp, > and specific to the host in question, as well as some factors to > group the hosts (e.g., whether it's in a "control" vs. a "test" > group; a broad categorization of how the host is provisioned; which > version of the software it was running at the time...). (Each > metric and factor is in a uniquely-named column.) > > As extracted from the repository, there were several records for each > such hostname/timestamp pair -- e.g., there would be separate records > for: > * Input bandwidth utilization for network interface 1 > * Output bandwidth utilization for network interface 1 > * Input bandwidth utilization for network interface 2 > * Output bandwidth utilization for network interface 2 > > (And the same field would be used for each of these -- the > interpretation being driven by the content of other fields in teh > record.) > > Working with the data as described (immediately) above directly in R > seemed... daunting, at best: thus the excursion into Perl. > > And for some of the data, what I have works well enough. > > But now I also want to analyze information from disk drives, and things > get messy (as far as I can see). > > First, each disk drive has a collection of 17 metrics (such as > "busy_pct", "kb_per_transfer_read", and "transfers_per_second_write"), > as well as a factor ("dev_type"). Each also has a device name that is > unique within the host where it resides (e.g. "da1", "da2", "da3"). > (The "dev_type" factor identifies whether the drive is a solid-state > device or a spinning disk.) > > I have thus made the corresponding columns unique by pasting the drive > name and the name of the metric (or factor), separating the two with > "_" (e.g. "da7_busy_pct"; "ada0_mb_per_second_write"; > "ada4_queue_length"). I am not certain that's the best thing I could > have done -- and I'm open to changing the approach. > > The challenge for me is that different (classes of) machines are > provisioned differently; some consequennces of that: > * While da1 may be a spinning disk on host A, that has no bearing on > whether or not the "da1" on host B is a spinning disk or an SSD. > * Host C may not even have a "da1" device. > * Host D may be of a type that normally has a "da1," but in this case, > the drive has failed and has been disabled (so host D won't report > anything about "da1"). > > (I'm not too bothered about the "non-reporting" case, but cite it so we > all know about it.) > > I expect I will want to be using groupings: > * All disk devices -- this one is easy. > * All SSD devices (excluding spinning disks). > * All spinning disks (excluding SSDs). > > I'm having trouble with the latter two (though, certainly, if I solve > one, the other is also solved). > > Also, for some of the metrics, I will want to sum them; for others, > I will want to do other things -- find minima or maxima, or average > them. So pre-calculating such aggregates in the Perl script isn't > something that appeals to me. > > Finally (as far as complications go), I'm trying to write the code in > such a way that if we deploy a new configuration of machine that has > (say) twice as many drives as the biggest one we presently deploy, the > code Just Works -- I shouldn't need to update the code merely to adapt > to another hardware configuration. > > I have been able to write a function that takes the data.frame obtained > by reading the above-cited CSV, and generates a data.frame with a row > for each host, and depicts the "dev_type" for each device for that host; > here's an abbreviated (and slightly redacted) copy of its output to > illustrate some of the above: > >ada0 ada1 ada2 ada3 ada4 ada5 da30 da31 da32 da33 da34 da35 da36 > da3 > host_A ssd ssd hdd hdd hdd hdd hdd hdd hdd hdd hdd hdd hdd > hdd > host_B ssd ssd hdd hdd hdd hdd hdd hdd hdd hdd hdd hdd hdd > hdd > host_G ssd ssd ssd ssd ssd ssd > ssd > host_H ssd ssd ssd ssd ssd ssd > ssd > host_M ssd ssd ssd ssd ssd ssd > ssd > host_N ssd ssd ssd ssd ssd ssd > ssd > > (That function is written with the explicit assumption(!) that
Re: [R] Debug package options
Duncan, Thanks for the help. Since I am the only person using this machine and I couldn’t figure out where to put the option statement aside from: C:\Program Files\R\R-3.1.2\etc In the file Rprofile.site The option that I wanted was: options(debug.font = "Consolas 12”) Which allowed me to have the right size font and Tk window to be able to do debugging using the debug package. In case you are interested I use Windows 7 on my Mac via Parallels. Thanks again, Best, KW > On Mar 30, 2015, at 2:05 PM, Duncan Murdoch wrote: > > On 30/03/2015 1:50 PM, Keith S Weintraub wrote: >> Folks, >> >> I would like change some of the options for the Tk window that pops up when >> using the debug package. >> >> I know how to change the options: e.g. options(debug.font = "Courier 12 >> italic”). >> >> Is there a way to “preset” these in my environment so when debug starts up I >> have all the options set up the way I want them? >> >> Do I do this in a .First file? Does the .First file have to load the debug >> package every time I start up R? >> >> No need to do my work for me. Just point me to the right doc. > > See the ?Startup help topic. You probably want to use one of the > profile files rather than .First, because .First needs to be in a > workspace, and you shouldn't be loading a workspace every time. > > Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating Kendall's tau
I am analyzing trend using Mann-kendall test for 31 independent sample, each sample have 34 years dataset. I supposed to find Kendall “tau” for each sample. The data is arranged in column wise (I attached the data).To find Kendall tau, I wrote R script as: desta<-read.csv("rainfall.csv", header=T, sep=",") require(Kendall) MK<-function(y) { nc<-ncol(y) MannKendalltau<- numeric(nc) for(i in 2:nc){ MannKendalltau[i]<-MannKendall(y[,i]) } MannKendalltau } MK(desta) The displayed result showed both “tau” and “2-sided p-value”in unorganized way. But, I want only “tau” value that is presented in organized manner. Anyone can tell me how can I get orderly displayed “tau” value? here is my sample result: [[1]][1] 0 [[2]][1] 0.4352941attr(,"Csingle")[1] TRUE [[3]][1] 0.5462185attr(,"Csingle")[1] TRUE [[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Kendall's tau
This sounds like homework. Homework is discouraged on this list (but you might get lucky). Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help wrote: > I am analyzing trend using Mann-kendall test for 31 independent sample, > each sample have 34 years dataset. I supposed to find Kendall “tau” for > each sample. The data is arranged in column wise (I attached the data).To > find Kendall tau, I wrote R script as: > desta<-read.csv("rainfall.csv", header=T, sep=",") require(Kendall) > MK<-function(y) { nc<-ncol(y) > MannKendalltau<- numeric(nc) for(i in 2:nc){ > MannKendalltau[i]<-MannKendall(y[,i]) } > MannKendalltau}MK(desta) > The displayed result showed both “tau” and “2-sided p-value”in unorganized > way. But, I want only “tau” value that is presented in organized manner. > Anyone can tell me how can I get orderly displayed “tau” value? here is my > sample result: [[1]][1] 0 > [[2]][1] 0.4352941attr(,"Csingle")[1] TRUE > [[3]][1] 0.5462185attr(,"Csingle")[1] TRUE > [[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in lm() with very small (close to zero) regressor
If you really want your coefficient estimates to be scale-equivariant you should test those methods for such a thing. E.g., here are functions that let you check how scaling one predictor affects the estimated coefficients - they should give the same results for any scale factor. f <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n { cf <- coef(lm(data=data, Y ~ X1 + I(X2/scale))) cf * c(1, 1, 1/scale) } g <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n { cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4)) cf * c(1, 1, 1/scale) } h <- function (scale=1, n=100, data=data.frame(Y=seq_len(n), X1=sqrt(seq_len(n)), X2=log(seq_len(n { cf <- coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5)) cf * c(1, 1, 1/scale) } See how they compare for scale factors between 10^-15 and 10^15. lm() is looking pretty good. > options(digits=4) > scale <- 10 ^ seq(-15,15,by=5) > sapply(scale, f) [,1][,2][,3][,4][,5][,6][,7] (Intercept) -9.393 -9.393 -9.393 -9.393 -9.393 -9.393 -9.393 X1 19.955 19.955 19.955 19.955 19.955 19.955 19.955 I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372 -20.372 > sapply(scale, g) [,1][,2][,3][,4][,5][,6] [,7] (Intercept) 0.000e+00 -9.393 -9.393 -9.393 -9.393 -9.393 -3.126e+01 X1 2.772e-29 19.955 19.955 19.955 19.955 19.955 1.218e+01 I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372 -2.892e-29 > sapply(scale, h) [,1] [,2][,3][,4][,5] [,6] [,7] (Intercept) 0.000e+00 3.807e-20 -9.395 -9.393 -9.393 -3.126e+01 -3.126e+01 X1 2.945e-29 2.772e-19 19.954 19.955 19.955 1.218e+01 1.218e+01 I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19 6.596e-30 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 31, 2015 at 5:10 AM, RiGui wrote: > I found a fix to my problem using the fastLm() from package RcppEigen, > using > the Jacobi singular value decomposition (SVD) (method 4) or a method based > on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm > function > > > > install.packages("RcppEigen") > library(RcppEigen) > > n_obs <- 1500 > y <- rnorm(n_obs, 10,2.89) > x1 <- rnorm(n_obs, 0.01235657,0.45) > x2 <- rnorm(n_obs, 10,3.21) > X <- cbind(x1,x2) > > > > bFE <- fastLm(y ~ x1 + x2, method =4) > bFE > > Call: > fastLm.formula(formula = y ~ x1 + x2, method = 4) > > Coefficients: > (Intercept) x1 x2 > 9.94832839474159414 0.12293 0.00440078989949841 > > > Best, > > Raluca > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating Kendall's tau
OK. But always reply to the list (which I am ccing here) so that everyone knows -- and re-submit your OP in **PLAIN TEXT**, not html, as this is a plain text list and html typically garbles everything. Also, reading and following the posting guide (see end of this email) generally improves your chance of getting useful help. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Tue, Mar 31, 2015 at 9:24 AM, Desta Yoseph wrote: > Dear Bert, > It is not homework. Actually my real work is for 10,360 sample data. But if > some one showed me for 31 sample dataset, i can manage for large sample > data. > hopefully this give you few hint why i really want someone help. > cheers > > > > On Tuesday, March 31, 2015 6:14 PM, Bert Gunter > wrote: > > > This sounds like homework. Homework is discouraged on this list (but > you might get lucky). > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help > wrote: >> I am analyzing trend using Mann-kendall test for 31 independent sample, >> each sample have 34 years dataset. I supposed to find Kendall “tau” for >> each sample. The data is arranged in column wise (I attached the data).To >> find Kendall tau, I wrote R script as: >> desta<-read.csv("rainfall.csv", header=T, sep=",") >> require(Kendall) MK<-function(y) {nc<-ncol(y) >> MannKendalltau<- numeric(nc)for(i in 2:nc){ >> MannKendalltau[i]<-MannKendall(y[,i]) }MannKendalltau >> }MK(desta) >> The displayed result showed both “tau” and “2-sided p-value”in >> unorganized way. But, I want only “tau” value that is presented in >> organized manner. Anyone can tell me how can I get orderly displayed “tau” >> value? here is my sample result: [[1]][1] 0 >> [[2]][1] 0.4352941attr(,"Csingle")[1] TRUE >> [[3]][1] 0.5462185attr(,"Csingle")[1] TRUE >> [[4]][1] 0.4218487attr(,"Csingle")[1] TRUEThank you for your guidance >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing column labels for data frames inside a list
> Date: Mon, 30 Mar 2015 09:54:39 -0400 > From: Vikram Chhatre > To: r-help@r-project.org > Subject: [R] changing column labels for data frames inside a list > Message-ID: > Content-Type: text/plain; charset="UTF-8" > > > summary(mygenfreqt) > Length Class Mode > dat1.str 59220 -none- numeric > dat2.str 59220 -none- numeric > dat3.str 59220 -none- numeric > > > head(mylist[[1]]) >1 2 3 4 5 6 7 8 910 > 12 > L0001.1 0.60 0.500 0.325 0.675 0.600 0.500 0.500 0.375 0.550 0.475 0.3 > 0.275 > L0001.2 0.40 0.500 0.675 0.325 0.400 0.500 0.500 0.625 0.450 0.525 0.6 > 0.725 > > I want to change 1:12 to pop1:pop12 > > mylist<- lapply(mylist, function(e) colnames(e) <- paste0('pop',1:12)) > > What this is doing is replacing the data frames with just names > pop1:pop12. I just want to replace the column labels. > > Thanks for any suggestions. Some readers have already replied, but here is another option that exploits lapply()'s "..." parameter. First, we make a reproducible example. (lista <- list(mtcars, mtcars)) Now, we get the unique number of columns of the data frames in the variable "lista". (n.cols <- unique(sapply(lista, ncol))) Finally, we call lapply() and `colnames<-` to change the column names of both data frames in "lista". See lapply()'s "..." parameter (?lapply). (lista <- lapply(X = lista, FUN = `colnames<-`, paste0("pop", seq_len(n.cols > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to deal with changing weighting functions
Can you give a concrete simple example of inputs with expected results? Is phi a function? Of omega 1 and 2? Is the summation over everything through V_d-k? On Mon, Mar 30, 2015 at 2:58 PM, T.Riedle wrote: > Hi everybody, > Does anybody have an idea how I can generate tau according to the attached > formula? The point is that phi changes with k and I thought I could make it > by using a for-function in R but I am not sure how to do that. > > Could anyone help me? > Thanks in advance. > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Randomly interleaving data frames while preserving order
Hello. I am trying to simulate recruitment in a randomized trial. Suppose I have three streams (strata) of patients represented by these data frames. df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) What I need to do is construct a data frame with all of these combined where the order of selection from one of the three data frames is randomized but once a stratum is selected patients are selected sequentially from that data frame. To see what I'm looking to achieve, suppose the first five subjects were to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected result should look like this: rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) strat id pid 1 1 1 1001 2 2 1 2001 21 1 2 1002 4 3 1 3001 22 2 2 2002 I hope what I'm trying to accomplish makes sense. Maybe I'm missing something obvious, but I really have no idea at the moment how to achieve this elegantly. Since I need to simulate many trial recruitments it needs to be general and compact. I appreciate any advice. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame: data-driven column selections that vary by row??
On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote: > I think we need some data and code > Reproducibility > https://github.com/hadley/devtools/wiki/Reproducibility > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > I apologize for failing to provide that. Here is a quite small subset of the data (with a few edits to reduce excess verbosity in names of things) that still illustrates the challenge I perceive: > dput(bw) structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", "c021", "c022", "c041", "c051"), health = c(0.054937499983, 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151 ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = c(39.77, 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = c(43.5, 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = c(690.67, 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read = c(66, 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", "da20_ms_per_xactn_write", "da20_Q_length", "da20_xfers_per_sec_other", "da20_xfers_per_sec_read", "da20_xfers_per_sec_write", "da2_busy_pct", "da2_dev_type", "da2_kb_per_xfer_read", "da2_kb_per_xfer_write", "da2_mb_per_sec_read", "da2_mb_per_sec_write", "da2_ms_per_xactn_read", "da2_ms_per_xactn_write", "da2_Q_length", "da2_xfers_per_sec_other", "da2_xfers_per_sec_read", "da2_xfers_per_sec_write"), class = "data.frame", row.name
Re: [R] Randomly interleaving data frames while preserving order
That's a fun one. Here's one possible approach. (Note that it can be done without using a loop, but I find that a loop here increases readability.) I wrote it to work on a list of data frames. If the selection is random, I'd set it up so that size is passed to the function, but selection is generated within the function using sample(). recruitment <- function(dflist, selection) { results <- data.frame(matrix(NA, nrow=length(selection), ncol=ncol(dflist[[1]]))) colnames(results) <- colnames(dflist[[1]]) for(i in unique(selection)) { results[selection == i, ] <- dflist[[i]][seq_len(sum(selection == i)),] } results } # and your example: df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) touse <- c(1, 2, 1, 3, 1) # could be generated using sample dfall <- list(df1, df2, df3) touse <- c(1, 2, 1, 3, 1) # could be generated using sample given the size argument # touse <- sample(seq_along(dfall), size=5, replace=TRUE) > recruitment(dfall, touse) strat id pid 1 1 1 1001 2 2 1 2001 3 1 2 1002 4 3 1 3001 5 1 3 1003 Sarah On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe wrote: > Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I have > three streams (strata) of patients represented by these data frames. > > df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > > What I need to do is construct a data frame with all of these combined where > the order of selection from one of the three data frames is randomized but > once a stratum is selected patients are selected sequentially from that data > frame. > > To see what I'm looking to achieve, suppose the first five subjects were to > come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected > result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) >strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to achieve > this elegantly. Since I need to simulate many trial recruitments it needs to > be general and compact. > > I appreciate any advice. > > Kevin > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly interleaving data frames while preserving order
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote: > Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. > > df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice. How about something like this: # Permute an ordered vector of selections: sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3 # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] # Put the original dataframes into the appropriate slots: df[sel == 1,] <- df1 df[sel == 2,] <- df2 df[sel == 3,] <- df3 # Clean up the rownames rownames(df) <- NULL Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] idiom for constructing data frame
Hi folks, I KNOW there has to be a way to do this more elegantly, but I consistently fail to come up with it, as I was just reminded while writing an example for a query on this list. What's a nifty way to construct a data frame of a given size? The only way I know of it to use matrix(), eg data.frame(matrix(NA, nrow=10, ncol=3)) and then to set the colnames in a second step. This comes up a lot when pre-allocated a data frame before using a loop: I know the size and column names, but want an empty structure to fill later. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
I just snagged this from Duncan Murdoch's reply to the same question: # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] This skips matrix(), but how to set the column names programmatically within a function? Sarah, still sure I'm missing something obvious On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee wrote: > Hi folks, > > I KNOW there has to be a way to do this more elegantly, but I > consistently fail to come up with it, as I was just reminded while > writing an example for a query on this list. > > What's a nifty way to construct a data frame of a given size? The only > way I know of it to use matrix(), eg > > data.frame(matrix(NA, nrow=10, ncol=3)) > > and then to set the colnames in a second step. > > This comes up a lot when pre-allocated a data frame before using a > loop: I know the size and column names, but want an empty structure to > fill later. > > Sarah > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly interleaving data frames while preserving order
On 03/31/2015 01:44 PM, Duncan Murdoch wrote: On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote: Hello. I am trying to simulate recruitment in a randomized trial. Suppose I have three streams (strata) of patients represented by these data frames. df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) What I need to do is construct a data frame with all of these combined where the order of selection from one of the three data frames is randomized but once a stratum is selected patients are selected sequentially from that data frame. To see what I'm looking to achieve, suppose the first five subjects were to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected result should look like this: rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) strat id pid 1 1 1 1001 2 2 1 2001 21 1 2 1002 4 3 1 3001 22 2 2 2002 I hope what I'm trying to accomplish makes sense. Maybe I'm missing something obvious, but I really have no idea at the moment how to achieve this elegantly. Since I need to simulate many trial recruitments it needs to be general and compact. I appreciate any advice. How about something like this: # Permute an ordered vector of selections: sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3 # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] # Put the original dataframes into the appropriate slots: df[sel == 1,] <- df1 df[sel == 2,] <- df2 df[sel == 3,] <- df3 # Clean up the rownames rownames(df) <- NULL Duncan Murdoch Thanks Duncan. Once you see the solution it is indeed obvious. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly interleaving data frames while preserving order
samples<-sample(c(rep(1,10),rep(2,10),rep(3,10)),30) samples[samples==1]<-1001:1010 samples[samples==2]<-2001:2010 samples[samples==3]<-3001:3010 fullDf<-rbind(df1,df2,df3) fullDf[sort(order(samples),index.return=TRUE)$ix,] On Tue, 2015-03-31 at 13:05 -0400, Kevin E. Thorpe wrote: > Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. > > > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice. > > Kevin > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
On 31/03/2015 1:52 PM, Sarah Goslee wrote: I just snagged this from Duncan Murdoch's reply to the same question: # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] This skips matrix(), but how to set the column names programmatically within a function? Sarah, still sure I'm missing something obvious The matrix() function has a dimnames argument, so you could do this: names <- c("strat", "id", "pid") data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) Duncan Murdoch On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee wrote: > Hi folks, > > I KNOW there has to be a way to do this more elegantly, but I > consistently fail to come up with it, as I was just reminded while > writing an example for a query on this list. > > What's a nifty way to construct a data frame of a given size? The only > way I know of it to use matrix(), eg > > data.frame(matrix(NA, nrow=10, ncol=3)) > > and then to set the colnames in a second step. > > This comes up a lot when pre-allocated a data frame before using a > loop: I know the size and column names, but want an empty structure to > fill later. > > Sarah > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Randomly interleaving data frames while preserving order
> -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kevin > E. Thorpe > Sent: Tuesday, March 31, 2015 10:53 AM > To: Duncan Murdoch > Cc: R Help Mailing List > Subject: Re: [R] Randomly interleaving data frames while preserving > order > > On 03/31/2015 01:44 PM, Duncan Murdoch wrote: > > On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote: > >> Hello. > >> > >> I am trying to simulate recruitment in a randomized trial. Suppose I > >> have three streams (strata) of patients represented by these data > frames. > >> > >> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > >> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > >> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > >> > >> What I need to do is construct a data frame with all of these > combined > >> where the order of selection from one of the three data frames is > >> randomized but once a stratum is selected patients are selected > >> sequentially from that data frame. > >> > >> To see what I'm looking to achieve, suppose the first five subjects > were > >> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > >> expected result should look like this: > >> > >> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > >> strat id pid > >> 1 1 1 1001 > >> 2 2 1 2001 > >> 21 1 2 1002 > >> 4 3 1 3001 > >> 22 2 2 2002 > >> > >> I hope what I'm trying to accomplish makes sense. Maybe I'm missing > >> something obvious, but I really have no idea at the moment how to > >> achieve this elegantly. Since I need to simulate many trial > recruitments > >> it needs to be general and compact. > >> > >> I appreciate any advice. > > > > How about something like this: > > > > # Permute an ordered vector of selections: > > sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, > nrow(df3 > > > > # Create an empty dataframe to hold the results > > df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] > > > > # Put the original dataframes into the appropriate slots: > > df[sel == 1,] <- df1 > > df[sel == 2,] <- df2 > > df[sel == 3,] <- df3 > > > > # Clean up the rownames > > rownames(df) <- NULL > > > > Duncan Murdoch > > > > Thanks Duncan. > > Once you see the solution it is indeed obvious. > > Kevin > > -- > Kevin E. Thorpe > Head of Biostatistics, Applied Health Research Centre (AHRC) > Li Ka Shing Knowledge Institute of St. Michael's > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: kevin.tho...@utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > Another option would be to stack your strata and then sample from the combined data frame, something like this: sample_size <- 10 population <- rbind(df1,df2,df3) sim.sample <- pop[sample(nrow(pop),sample_size, replace=FALSE),] Hope this is helpful, Dan Daniel J. Nordlund, PhD Research and Data Analysis Division Services & Enterprise Support Administration Washington State Department of Social and Health Services __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame: data-driven column selections that vary by row??
Hi David, I suggest reading http://www.jstatsoft.org/v59/i10, then: library(tidyr) library(dplyr) bw <- gather(bw, key = "tmp", value = "value", matches("^d[a-z]+[0-9]+")) bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge") bw <- spread(bw, var, value) Best, Ista On Tue, Mar 31, 2015 at 1:22 PM, David Wolfskill wrote: > On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote: >> I think we need some data and code >> Reproducibility >> https://github.com/hadley/devtools/wiki/Reproducibility >> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> > > I apologize for failing to provide that. > > Here is a quite small subset of the data (with a few edits to reduce > excess verbosity in names of things) that still illustrates the > challenge I perceive: > >> dput(bw) > structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, > 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, > 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, > 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L > ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", > "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", > "c021", "c022", "c041", "c051"), health = c(0.054937499983, > 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, > 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, > 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151 > ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, > 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L > ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, > 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, > 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, > 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, > 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, > 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, > NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, > 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", > "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, > 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, > 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read > = c(39.77, > 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, > NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = > c(43.5, > 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, > NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, > 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA > ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, > 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, > NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, > NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, > 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, > 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, > 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, > 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = > c(690.67, > 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, > 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, > 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, > 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, > 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), > da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, > 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, > 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, > 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), > da2_xfers_per_sec_other = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), > da2_xfers_per_sec_read = c(66, > 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, > 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", > "hostname", "health", "hw", "fw", "role", "type", "da20_busy_pct", > "da20_dev_type", "da20_kb_per_xfer_read", "da20_kb_per_xfer_write", > "da20_mb_per_sec_read", "da20_mb_per_sec_write", "da20_ms_per_xactn_read", > "da20_m
Re: [R] data.frame: data-driven column selections that vary by row??
Nice clean-up!!! On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote: > library(tidyr) > library(dplyr) > bw <- gather(bw, key = "tmp", value = "value", > matches("^d[a-z]+[0-9]+")) > bw <- separate(bw, tmp, c("disc", "var"), "_", extra = "merge") > bw <- spread(bw, var, value) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.frame: data-driven column selections that vary by row??
Not entirely sure I understand your problem here (your first email was a lot of reading). Would it make sense to add an extra column device_name Thus ending up with something like: Host Device Type host_Aada0ssd host_Aada1ssd host_Aada2hdd ... host_Nda3 ssd You could then subset this dataframe: subset(data,Type=="ssd" & Device=="ada0") On Tue, 2015-03-31 at 10:22 -0700, David Wolfskill wrote: > On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote: > > I think we need some data and code > > Reproducibility > > https://github.com/hadley/devtools/wiki/Reproducibility > > > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > > > > I apologize for failing to provide that. > > Here is a quite small subset of the data (with a few edits to reduce > excess verbosity in names of things) that still illustrates the > challenge I perceive: > > > dput(bw) > structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, > 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, > 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, > 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L > ), hostname = c("c001", "c002", "c021", "c022", "c041", "c051", > "c001", "c002", "c021", "c022", "c041", "c051", "c001", "c002", > "c021", "c022", "c041", "c051"), health = c(0.054937499983, > 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, > 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, > 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151 > ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, > 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L > ), .Label = "2015Q1.2", class = "factor"), role = structure(c(1L, > 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, > 2L), .Label = c("control", "test"), class = "factor"), type = structure(c(3L, > 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, > 2L), .Label = c("D", "F", "H"), class = "factor"), da20_busy_pct = c(79.1, > 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, > NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, > 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("", > "hdd"), class = "factor"), da20_kb_per_xfer_read = c(727.23, > 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, > 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read > = c(39.77, > 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, > NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = > c(43.5, > 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, > NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, > NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, > 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA > ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, > NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, > 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, > NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, > NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, > 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, > 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, > 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, > 3L), .Label = c("", "hdd", "ssd"), class = "factor"), da2_kb_per_xfer_read = > c(690.67, > 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, > 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, > 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, > 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, > 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), > da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, > 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, > 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, > 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), > da2_xfers_per_sec_other = c(0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), > da2_xfers_per_sec_read = c(66, > 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, > 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("timestamp", > "hostname", "health", "hw", "fw", "role", "type", "da20_b
Re: [R] idiom for constructing data frame
Hi, Duncan Murdoch suggested: > The matrix() function has a dimnames argument, so you could do this: > > names <- c("strat", "id", "pid") > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) That's a definite improvement, thanks. But no way to skip matrix()? It just seems unRlike, although since it's only full of NA values there are no coercion issues with column types or anything, so it doesn't hurt. It's just inelegant. :) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does fitCopula work for amhCopula and joeCopula?
Good evening, this is a part of my Routine which calculates the copula parameter and loglikelihood for each pair of rows of a data matrix, choosing, for each pair, the copula which gives the maximum likelihood. If I do my computation with this routine with only: f <- frankCopula(2,2) g <- gumbelCopula(2,2) c <- claytonCopula(2,2) the program works correctly and gives the expected results. If I insert also: a <- amhCopula(1,2) j <- joeCopula(2,2) then the program doesn’t work anymore. I tried on samples such as: n <- 1000 f <- frankCopula(20,2) x_1 <- rCopula(n,f) f <- gumbelCopula(50,2) x_2 <- rCopula(n,f) f <- joeCopula(70,2) x_3<- rCopula(n,f) x <- cbind(x_1, x_2, x_3) data <- t(x) dim <- dim(data)[1] Here is the part of code of Routine_Copula: Routine_Copula <- function(data,dim){ library(copula) library(gtools) n <- dim(data)[1]; # number of rows of the input matrix m <- dim(data)[2]; # number of columns of the input matrix # Probability integral transform of the data ecdf <- matrix(0,n,m); for (i in 1:n){ e <- matrix(data[i,],m,1); #ecdf[i,] <- pobs(e); ecdf[i,] <- pobs(e, na.last=TRUE); #na.last for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA. } f <- frankCopula(2,2) g <- gumbelCopula(2,2) c <- claytonCopula(2,2) a <- amhCopula(1,2) j <- joeCopula(2,2) [….] for (j in 1:n_comb){ input <- t(ecdf[comb[,j],]) try(summary <- fitCopula(f,input,method='mpl',start=2),silent=TRUE); resmatpar[j,1] <- summary@estimate; resmatllk[j,1] <- summary@loglik; try(summary <- fitCopula(g,input,method='mpl',start=2),silent=TRUE); resmatpar[j,2] <- summary@estimate; resmatllk[j,2] <- summary@loglik; try(summary <- fitCopula(c,input,method='mpl',start=2),silent=TRUE); resmatpar[j,3] <- summary@estimate; resmatllk[j,3] <- summary@loglik; try(summary <- fitCopula(a,input,method='mpl',start=1),silent=TRUE); resmatpar[j,4] <- summary@estimate; resmatllk[j,4] <- summary@loglik; try(summary <- fitCopula(j,input,method='mpl',start=2),silent=TRUE); resmatpar[j,5] <- summary@estimate; resmatllk[j,5] <- summary@loglik; d <- c(resmatllk[j,1],resmatllk[j,2],resmatllk[j,3],resmatllk[j,4],resmatllk[j,5]); copchoice[j] <- which(d==max(d)); param[j] <- resmatpar[j,copchoice[j]]; loglik[j] <- resmatllk[j,copchoice[j]]; } Thank you Laura Gianfagna [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
You can make it as elegant as you want, e.g., make.empty.df <- function(nrow,ncol, names) { if(length(names) %% ncol != 0) stop("Lenght of names is not a multiple of the number of colums") data.frame(matrix(NA, nrow, ncol, dimnames = list(NULL, names))) } Best, Ista On Tue, Mar 31, 2015 at 2:37 PM, Sarah Goslee wrote: > Hi, > > Duncan Murdoch suggested: > >> The matrix() function has a dimnames argument, so you could do this: >> >> names <- c("strat", "id", "pid") >> data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) > > That's a definite improvement, thanks. But no way to skip matrix()? It > just seems unRlike, although since it's only full of NA values there > are no coercion issues with column types or anything, so it doesn't > hurt. It's just inelegant. :) > > Sarah > -- > Sarah Goslee > http://www.functionaldiversity.org > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
You can use structure() to attach the names to a list that is input to data.frame. E.g., dfNames <- c("First", "Second Name") data.frame(lapply(structure(dfNames, names=dfNames), function(name)rep(NA_real_, 5))) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee wrote: > Hi, > > Duncan Murdoch suggested: > > > The matrix() function has a dimnames argument, so you could do this: > > > > names <- c("strat", "id", "pid") > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) > > That's a definite improvement, thanks. But no way to skip matrix()? It > just seems unRlike, although since it's only full of NA values there > are no coercion issues with column types or anything, so it doesn't > hurt. It's just inelegant. :) > > Sarah > -- > Sarah Goslee > http://www.functionaldiversity.org > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: non-conformable arguments
Dear All, I want to run neural network on my data. i run these codes: #load mydata dim(mydata) # 20 3111 library(neuralnet) fm <- as.formula(paste("resp ~", paste(colnames(mydata)[1:3110], collapse="+"))) out <- neuralnet(fm,data=mydata, hidden = 4, lifesign = "minimal", linear.output = FALSE, threshold = 0.1) #load testset dim(testset) # 20 3111 out.results <- compute(out, testset) Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments what should I do now? Regards, Soheila [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to obtain a cross tab count of unique values
I have a data frame that shows all of the parks (including duplicates) that are impacted by a projects 'footprint': PROJECT PARKNAME A PRK A A PRK B A PRK A B PRK C B PRK A C PRK B C PRK D ... What I need is a cross tabulation that shows me the number of unique parks for each project. If I using the standard table(df$PROJECT) it reports: A 3 B 2 C 2 ... where I need it to ignore duplicates and report: A 2 B 2 C 2 ... Anyone have any suggestions on how to do this within the R paradigm? Walter Anderson __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain a cross tab count of unique values
Sure: tell R you want unique rows. > mydf <- data.frame(PROJECT=c("A","A","A","B","B","C","C"), PARKNAME=c("PRK > A", "PRK B", "PRK A", "PRK C", "PRK A", "PRK B", "PRK D"), > stringsAsFactors=FALSE) > mydf PROJECT PARKNAME 1 APRK A 2 APRK B 3 APRK A 4 BPRK C 5 BPRK A 6 CPRK B 7 CPRK D > mydf.unique <- unique(mydf) > table(mydf.unique$PROJECT) A B C 2 2 2 Please provide reproducible data yourself in the future. Sarah On Tue, Mar 31, 2015 at 3:51 PM, Walter Anderson wrote: > I have a data frame that shows all of the parks (including duplicates) > that are impacted by a projects 'footprint': > > PROJECT PARKNAME > A PRK A > A PRK B > A PRK A > B PRK C > B PRK A > C PRK B > C PRK D > ... > > What I need is a cross tabulation that shows me the number of unique > parks for each project. If I using the standard table(df$PROJECT) it > reports: > > A 3 > B 2 > C 2 > ... > > where I need it to ignore duplicates and report: > > A 2 > B 2 > C 2 > ... > > Anyone have any suggestions on how to do this within the R paradigm? > > Walter Anderson -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain a cross tab count of unique values
Hello, Try the following. table(unique(df)$PROJECT) And please note that 'df' is the name of an R function, use something else. Hope this helps, Rui Barradas Em 31-03-2015 20:51, Walter Anderson escreveu: I have a data frame that shows all of the parks (including duplicates) that are impacted by a projects 'footprint': PROJECT PARKNAME A PRK A A PRK B A PRK A B PRK C B PRK A C PRK B C PRK D ... What I need is a cross tabulation that shows me the number of unique parks for each project. If I using the standard table(df$PROJECT) it reports: A 3 B 2 C 2 ... where I need it to ignore duplicates and report: A 2 B 2 C 2 ... Anyone have any suggestions on how to do this within the R paradigm? Walter Anderson __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain a cross tab count of unique values
table(unique(df)$PROJECT) On Tue, 2015-03-31 at 14:51 -0500, Walter Anderson wrote: > I have a data frame that shows all of the parks (including duplicates) > that are impacted by a projects 'footprint': > > PROJECT PARKNAME > A PRK A > A PRK B > A PRK A > B PRK C > B PRK A > C PRK B > C PRK D > ... > > What I need is a cross tabulation that shows me the number of unique > parks for each project. If I using the standard table(df$PROJECT) it > reports: > > A 3 > B 2 > C 2 > ... > > where I need it to ignore duplicates and report: > > A 2 > B 2 > C 2 > ... > > Anyone have any suggestions on how to do this within the R paradigm? > > Walter Anderson > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
If you don't mind an extra column, you could use something similar to: data.frame(r=seq(8),foo=NA,bar=NA) If you do, here is another approach (see function body): empty.frame <- function (r = 1, n = 1, fill = NA_real_) { data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n)) } empty.frame() empty.frame(, seq(3)) empty.frame(8, c("foo", "bar")) I could not put it in one line either, without retyping at least one argument (n in this case). So I suggest a function is the way to go for a simplified syntax ... Thanks to all for the ideas! Sven On 31 March 2015 at 20:55, William Dunlap wrote: > You can use structure() to attach the names to a list that is input to > data.frame. > E.g., > > dfNames <- c("First", "Second Name") > data.frame(lapply(structure(dfNames, names=dfNames), > function(name)rep(NA_real_, 5))) > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee > wrote: > > > Hi, > > > > Duncan Murdoch suggested: > > > > > The matrix() function has a dimnames argument, so you could do this: > > > > > > names <- c("strat", "id", "pid") > > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) > > > > That's a definite improvement, thanks. But no way to skip matrix()? It > > just seems unRlike, although since it's only full of NA values there > > are no coercion issues with column types or anything, so it doesn't > > hurt. It's just inelegant. :) > > > > Sarah > > -- > > Sarah Goslee > > http://www.functionaldiversity.org > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
I got rid of the extra column. data.frame(r=seq(8), foo=NA, bar=NA, row.names="r") Rich On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer wrote: > If you don't mind an extra column, you could use something similar to: > > data.frame(r=seq(8),foo=NA,bar=NA) > > If you do, here is another approach (see function body): > > empty.frame <- function (r = 1, n = 1, fill = NA_real_) { > data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n)) > } > empty.frame() > empty.frame(, seq(3)) > empty.frame(8, c("foo", "bar")) > > I could not put it in one line either, without retyping at least one > argument (n in this case). > So I suggest a function is the way to go for a simplified syntax ... > > Thanks to all for the ideas! > Sven > > On 31 March 2015 at 20:55, William Dunlap wrote: > >> You can use structure() to attach the names to a list that is input to >> data.frame. >> E.g., >> >> dfNames <- c("First", "Second Name") >> data.frame(lapply(structure(dfNames, names=dfNames), >> function(name)rep(NA_real_, 5))) >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee >> wrote: >> >> > Hi, >> > >> > Duncan Murdoch suggested: >> > >> > > The matrix() function has a dimnames argument, so you could do this: >> > > >> > > names <- c("strat", "id", "pid") >> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) >> > >> > That's a definite improvement, thanks. But no way to skip matrix()? It >> > just seems unRlike, although since it's only full of NA values there >> > are no coercion issues with column types or anything, so it doesn't >> > hurt. It's just inelegant. :) >> > >> > Sarah >> > -- >> > Sarah Goslee >> > http://www.functionaldiversity.org >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can not load Rcmdr
I have a similar issue with tcl. I am using R on a Linux server. Rcmdr installed OK, but it won't run: > R.Version() $platform [1] "x86_64-unknown-linux-gnu" $arch [1] "x86_64" $os [1] "linux-gnu" $system [1] "x86_64, linux-gnu" $status [1] "" $major [1] "3" $minor [1] "1.0" $year [1] "2014" $month [1] "04" $day [1] "10" $`svn rev` [1] "65387" $language [1] "R" $version.string [1] "R version 3.1.0 (2014-04-10)" $nickname [1] "Spring Dance" > library(Rcmdr) Error : .onAttach failed in attachNamespace() for 'Rcmdr', details: call: structure(.External(.C_dotTcl, ...), class = "tclObj") error: [tcl] Invalid state name hover. Error: package or namespace load failed for 'Rcmdr' > This is kind of frustrating because I don't have admin privileges to install Rstudio on this server, either. I guess it's time to use Emacs. -- View this message in context: http://r.789695.n4.nabble.com/Can-not-load-Rcmdr-tp4655656p4705370.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using matlab code in R
Hi everybody, I have a matlab code which I would like to use for my empirical analysis. Unfortunately, I am not familiar with matlab and it would be great if there was a tool to "translate" the matlab code into R so that I can work with the code in R. Is there such a tool or package in R? Kind regards, T. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating different PCAs in R
Dear All, I want to use princomp() function in R in order to calculate Principle Component Analysis.In different papers, I have seen "PCA 1", "PCA 2", "PCA 11" , etc. Would you please tell me how can i calculate different PCAs in R?At the moment i just use this line "eigenVectors <- pca$loadings"But I don’t know if it is correct to use loadings.Thank you in advance. Best regards, Iman Dabbaghi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using matlab code in R
The Posting Guide recommends searching the archives before posting. Consider [1] and learn. [1] https://stat.ethz.ch/pipermail/r-help/2007-March/127981.html --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 31, 2015 1:47:49 PM PDT, "T.Riedle" wrote: >Hi everybody, >I have a matlab code which I would like to use for my empirical >analysis. Unfortunately, I am not familiar with matlab and it would be >great if there was a tool to "translate" the matlab code into R so that >I can work with the code in R. >Is there such a tool or package in R? > >Kind regards, >T. > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger wrote: > I got rid of the extra column. > > data.frame(r=seq(8), foo=NA, bar=NA, row.names="r") Brilliant! After much fussing, including a disturbing detour into nested lapply statements from which I barely emerged with my sanity (arguable, I suppose), here is a one-liner that creates a data frame of arbitrary number of rows given an existing data frame as template for column number and name: n <- 8 df1 <- data.frame(A=runif(9), B=runif(9)) do.call(data.frame, setNames(c(list(seq(n), "r"), as.list(rep(NA, ncol(df1, c("r", "row.names", colnames(df1 It's not elegant, but it is fairly R-ish. I should probably stop hunting for an elegant solution now. Thanks, everyone! Sarah > Rich > > On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer > wrote: >> If you don't mind an extra column, you could use something similar to: >> >> data.frame(r=seq(8),foo=NA,bar=NA) >> >> If you do, here is another approach (see function body): >> >> empty.frame <- function (r = 1, n = 1, fill = NA_real_) { >> data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n)) >> } >> empty.frame() >> empty.frame(, seq(3)) >> empty.frame(8, c("foo", "bar")) >> >> I could not put it in one line either, without retyping at least one >> argument (n in this case). >> So I suggest a function is the way to go for a simplified syntax ... >> >> Thanks to all for the ideas! >> Sven >> >> On 31 March 2015 at 20:55, William Dunlap wrote: >> >>> You can use structure() to attach the names to a list that is input to >>> data.frame. >>> E.g., >>> >>> dfNames <- c("First", "Second Name") >>> data.frame(lapply(structure(dfNames, names=dfNames), >>> function(name)rep(NA_real_, 5))) >>> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee >>> wrote: >>> >>> > Hi, >>> > >>> > Duncan Murdoch suggested: >>> > >>> > > The matrix() function has a dimnames argument, so you could do this: >>> > > >>> > > names <- c("strat", "id", "pid") >>> > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) >>> > >>> > That's a definite improvement, thanks. But no way to skip matrix()? It >>> > just seems unRlike, although since it's only full of NA values there >>> > are no coercion issues with column types or anything, so it doesn't >>> > hurt. It's just inelegant. :) >>> > >>> > Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] idiom for constructing data frame
I've got dataFrame() in R.utils for this purpose, e.g. > df <- dataFrame(colClasses=c(a="integer", b="double", c="character"), > nrow=10L) > str(df) 'data.frame': 10 obs. of 3 variables: $ a: int 0 0 0 0 0 0 0 0 0 0 $ b: num 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... Related: You can use the colClasses() function to generate the 'colClasses' argument "dynamically", e.g. > cols <- colClasses("idc") > names(cols) <- c("a", "b", "c") > str(cols) Named chr [1:3] "integer" "double" "character" - attr(*, "names")= chr [1:3] "a" "b" "c" > cols <- colClasses(sprintf("c2d%di", 4)) > df <- dataFrame(colClasses=cols, nrow=10L) str(df) 'data.frame': 10 obs. of 7 variables: $ : chr "" "" "" "" ... $ : num 0 0 0 0 0 0 0 0 0 0 $ : num 0 0 0 0 0 0 0 0 0 0 $ : int 0 0 0 0 0 0 0 0 0 0 $ : int 0 0 0 0 0 0 0 0 0 0 $ : int 0 0 0 0 0 0 0 0 0 0 $ : int 0 0 0 0 0 0 0 0 0 0 dataFrame() is basically implemented as: dataFrame <- function(colClasses, nrow=1L, ...) { df <- vector("list", length=length(colClasses)) names(df) <- names(colClasses) for (kk in seq(along=df)) { df[[kk]] <- vector(colClasses[kk], length=nrow) } attr(df, "row.names") <- seq(length=nrow) class(df) <- "data.frame" df } # dataFrame() /Henrik On Tue, Mar 31, 2015 at 4:42 PM, Sarah Goslee wrote: > On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger wrote: >> I got rid of the extra column. >> >> data.frame(r=seq(8), foo=NA, bar=NA, row.names="r") > > Brilliant! > > After much fussing, including a disturbing detour into nested lapply > statements from which I barely emerged with my sanity (arguable, I > suppose), here is a one-liner that creates a data frame of arbitrary > number of rows given an existing data frame as template for column > number and name: > > > n <- 8 > df1 <- data.frame(A=runif(9), B=runif(9)) > > do.call(data.frame, setNames(c(list(seq(n), "r"), as.list(rep(NA, > ncol(df1, c("r", "row.names", colnames(df1 > > It's not elegant, but it is fairly R-ish. I should probably stop > hunting for an elegant solution now. > > Thanks, everyone! > > Sarah > > >> Rich >> >> On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer >> wrote: >>> If you don't mind an extra column, you could use something similar to: >>> >>> data.frame(r=seq(8),foo=NA,bar=NA) >>> >>> If you do, here is another approach (see function body): >>> >>> empty.frame <- function (r = 1, n = 1, fill = NA_real_) { >>> data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n)) >>> } >>> empty.frame() >>> empty.frame(, seq(3)) >>> empty.frame(8, c("foo", "bar")) >>> >>> I could not put it in one line either, without retyping at least one >>> argument (n in this case). >>> So I suggest a function is the way to go for a simplified syntax ... >>> >>> Thanks to all for the ideas! >>> Sven >>> >>> On 31 March 2015 at 20:55, William Dunlap wrote: >>> You can use structure() to attach the names to a list that is input to data.frame. E.g., dfNames <- c("First", "Second Name") data.frame(lapply(structure(dfNames, names=dfNames), function(name)rep(NA_real_, 5))) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee wrote: > Hi, > > Duncan Murdoch suggested: > > > The matrix() function has a dimnames argument, so you could do this: > > > > names <- c("strat", "id", "pid") > > data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names))) > > That's a definite improvement, thanks. But no way to skip matrix()? It > just seems unRlike, although since it's only full of NA values there > are no coercion issues with column types or anything, so it doesn't > hurt. It's just inelegant. :) > > Sarah > > -- > Sarah Goslee > http://www.functionaldiversity.org > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Edit plot adehabitatHS
Dear R experts, I am making a selectivity analysis using the package. Nevertheless, I am having some troubles, and I would like to know if somebody know how to help me: 1) When changing the x-axis labels. The programm uses the name habitat instead of the names specified in the file. 2) Is it possible to edit this plot, #3 (bottom-left), for example is it possible to change the symbols style, the legend position or size? I do not know how to edit this kind of data, if somebody has an example using this package, I would really appreaciate it. Thanks!! Please find attached the plot, the script and the respective plot library(adehabitatHS) pse<-read.table("pseudos.txt", header=T) attach(pse) names(pse) head(pse) (wiRatio <- widesI(Diet, Dis)) png(filename = "plotpseudos3.png", width = 500, height = 500) opar <- par(mfrow=c(2,2)) plot(wiRatio) par(opar) dev.off() MSp Orden Dis Diet MSp52 Hemiptera 31 2 MSp84 Hemiptera 2 1 MSp92 Hymenoptera 47 2 MSp100 Hymenoptera 19 1 MSp101 Hymenoptera 31 28 MSp102 Hymenoptera 83 15 MSp104 Hymenoptera 77 40 MSp105 Hymenoptera 110 9 MSp106 Hymenoptera 41 3 MSp107 Hymenoptera 1 3 MSp108 Hymenoptera 1 2 MSp109 Hymenoptera 1 1 MSp110 Hymenoptera 1 1 MSp143 Mantodea1 1 MSp164 Neuroptera 5 1 MSp176 Araneae 6 1 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.