Hello I've downloaded the tar.gz file of the package "lme4" and when I use the coomand: install.packages("lme4_1.1-8.tar.gz", repos = NULL, type = "source") appears an error that suspends the installation: In file included from external.cpp:8:0: predModule.h:12:23: fatal error: RcppEigen.h: No such file or directory compilation terminated. make: *** [external.o] Error 1 ERROR: compilation failed for package 'lme4' * removing '/home/aurora/R/x86_64-pc-linux-gnu-library/3.2/lme4' Does anyone know how to fix it? Thank you very much! My sessionInfo: R version 3.2.1 (2015-06-18) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.2.1
Hello everybody. I am using the GA package[1] in order to optimize the hyperparameter of SVM like in this example is done: http://stackoverflow.com/questions/32026436/how-to-optimize-parameters-using-genetic-algorithms However, when I try to adapt the example for random forest, it takes very very long to optimize. It might be because the hyperparameter of random forest are integers (ntree, mtry, nodes) but I don't know if there is a way to specify it in the algorithm. Any suggestion would be very much appreciated. Thank you! The code: library(GA) library("randomForest") data(Ozone, package="mlbench") Data <- na.omit(Ozone) # Setup the data for cross-validation K = 5 # 5-fold cross-validation fold_inds <- sample(1:K, nrow(Data), replace = TRUE) lst_CV_data <- lapply(1:K, function(i) list( train_data = Data[fold_inds != i, , drop = FALSE], test_data = Data[fold_inds == i, , drop = FALSE])) # Given the values of parameters 'ntree', 'mtry' and 'nodesize', return the rmse of the model over the test data evalParamsRF <- function(train_data, test_data, ntree, mtry, nodesize) { # Train model <- randomForest(V4 ~ ., data = train_data, ntree = ntree, mtry = mtry, nodesize = nodesize , proximity=T) # Test rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2) return (rmse) } fitnessFuncRF <- function(x, Lst_CV_Data) { # Retrieve the RF parameters ntree_val <- x[1] mtry_val <- x[2] nodesize_val <- x[3] # Use cross-validation to estimate the RMSE for each split of the dataset rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data, evalParamsRF(train_data, test_data, ntree_val , mtry_val, nodesize_val))) # As fitness measure, return minus the average rmse (over the cross-validation folds), # so that by maximizing fitness we are minimizing the rmse return (-mean(rmse_vals)) } theta_min <- c(ntree = 100, mtry = 2, nodesize = 3) theta_max <- c(ntree = 1000, mtry = 7, nodesize = 20) # Run the genetic algorithm results <- ga(type = "real-valued", fitness = fitnessFuncRF, lst_CV_data, names = names(theta_min), min = theta_min, max = theta_max, popSize = 50, maxiter = 10) summary(results) summary(results)$solution Links: -- [1] https://cran.r-project.org/web/packages/GA/index.html
Dear all, I'm trying to cluster some data using SAX distance that was described in the paper "a symbolic representation of time series with implications for streaming algorithms" http://www.cs.ucr.edu/~eamonn/SAX.pdf Once I have my data in matrix format, which function can I use to compute the dissimilarity matrix? There are several ones to compute the distance between two SAX data series diss.MINDIST.SAX(x, y, w, alpha, plot=TRUE) Func.dist(x, y, matrix, n) but it is very slow when I try to fill the matrix with two loops and I really think there should be already any implentation. Do you have any idea? I already convert the data into a series of "a", "b", "c", ... etc data so I would appreciate either the directo computation of the sax matrix using my raw data OR using the data already converted to SAX format. Thank you for any suggestion!
Hello I have two for loops that I am trying to optimize... I looked for vectorization or for using some funcions of the apply family but really cannot do it. I am writting my code with some small data set. With this size there is no problem but sometimes I will have hundreds of rows so it is really important to optimize the code. Any suggestion will be very welcomed. library("TSMining") dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T), V2 = sample(c(1,2,3,4),30,replace = T), V3 = sample(c(1,2,3,4),30,replace = T), V4 = sample(c(1,2,3,4),30,replace = T)) saxM = Func.matrix(5) colnames(saxM) = 1:5 rownames(saxM) = 1:5 matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS)) FOR(I IN 1:(NROW(DATAS)-1)){ FOR(J IN (1+I):NROW(DATAS)){ MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]), AS.CHARACTER(DATAS[J,]), SAXM, N=60) } } matrixPrepared Thank you!
Hello everybody. I have an "esthetic" question. I have managed to create a stacked and grouped bar plot but I don't manage with putting the text in the middle of the bar plots. Do you know how to write the numbers in that position? Thank you so much. Example code: test <- data.frame(variables = c("PE_35", "PE_49"), value1=c(13,3), value2=c(75,31), value3=c(7,17), value4 =c(5,49)) library(reshape2) # for melt melted <- melt(test, "variables") melted$cO <- c("A","A","B","B","A","A","B","B") melted$cat <- '' melted[melted$variable == 'value1' | melted$variable == 'value2',]$cat <- "0" melted[melted$variable == 'value3' | melted$variable == 'value4',]$cat <- "1" names(melted)[3] <- "recuento" library(ggplot2) ggplot(melted, aes(x = cat, y = recuento,ymax=max(recuento)*1.05, fill = cO)) + geom_bar(stat = 'identity', position = 'stack', col="black") + facet_grid(~ variables)+ geom_text(aes(label = recuento), size = 5, hjust = 0.5, vjust = 1, position ="stack")
Hello. I have a question for Rmarkdown users. Is there any way to give a name to the output document inside the Rmd? For example, my rmd's name is "bb.Rmd" but when I knitr to pdf I want it to name the pdf differently than "bb.pdf", for example, "doc1.pdf". Is there any way to do this? Thank you very much
Hello. I am trying to plot a 3d surface given its equation. The R code is written in blue. So, let's say that I have the points x,y,z and I plot them. Also, I compute its regression surface doing polynomical regression (fit) library('rgl') x <- c(-32.09652, -28.79491, -25.48977, -23.18746,-20.88934, -18.58220, -17.27919) y <- c(-32.096, -28.794, -25.489, -23.187,-20.889, -18.582, -17.279) z <- c(12.16344, 28.84962, 22.36605, 20.13733, 79.50248, 65.46150,44.52274) plot3d(x,y,z, type="s", col="red", size=1) fit <- lm(z ~ poly(x,2) + poly(y,2)) In this way, I obtain the coefficients of the surface coef(fit) (Intercept) poly(x, 2)1 poly(x, 2)2 3.900045e+01 1.763363e+06 6.683531e+05 poly(y, 2)1 poly(y, 2)2 -1.763303e+06 -6.683944e+05 So I want to repressent the surface 3.900045e+01 +1.763363e+06*x + 6.683531e+05*x*x -1.763303e+06*y-6.683944e+05*y*y How could I do it? Any idea?? Thank you very much!
Hello. I am drawing a graph using graphviz. It works but now, I am trying to use some palettes from the RColorBrewer pakcage. Any idea why this diagram works when the code (in .Rmd) is ```{r, engine='dot', echo=F} digraph unix{ size=30; ratio=compress; param [label=" Contrastes paramétricos ", shape=oval, style="filled,rounded,diagonals", fillcolor=dodgerblue3, fontcolor=gray90]; ``` but it doesn't work if I try to use some colors of any palette ```{r, echo=FALSE} library("RColorBrewer") colores <- brewer.pal(11,"PiYG") ``` ```{r, engine='dot', echo=F} digraph unix{ size=30; ratio=compress; param [label=" Contrastes paramétricos ", shape=oval, style="filled,rounded,diagonals", fillcolor=colores[1], fontcolor=gray90]; ``` Thank you very much!!
Hello everybody. I have a statistics question: let's say that I want to compaire answers between men and women to a yes/no question but I have so much more women than men, then, it looks like I cannot use chi squared test. Would it be correct to use U test (or ranked Wilcoxon test)?? What do you think?? The code is below, than you so much!! men<-rep( 0,12 ) women <- c( 0,1,0,0,0,1,0,0,0,rep( 0,114 ),1,rep( 0,199 ) ) wilcox.test( men, women ) chisq.test( men, women )
Dear R users, I have a very specific question. I want to know how to create a local git repository from an exisitng file (with some documents inside) just like we do when typing git init but from Rstudio. I tried selecting FIle-->New Project-->Existing Directory--> and I select the file but I am not sure about what I should do. Thank you very much for all your advices.
Dear all, I have a problem with the caption option on the xtable function. Using Rmarkdown, knitr generates correctly a pdf when I write something like this: ```{r xtable, results="asis"} library( xtable ) variableName <- c( "V03_1" ) age <- c( rep(1,10),rep(2,10),rep(3,10) ) gender <- c( rep("m",15), rep("f",15) ) df <- data.frame( age, gender ) t <- xtable( df, caption = "hello" ) print( t, caption.placement = 'top',comment = FALSE ) ``` But if I change to t <- xtable(df, caption = variableName) wich is what I really want it retuns a pandoc error: ! Missing $ inserted. $ l.112 \caption{V03_1} pandoc: Error producing PDF from TeX source Error: pandoc document conversion failed with error 43 I don't know why because variableName is also a character variable! Any idea? Thank you very much!
Hello everybody. I am using the caret package in order to predict something from some data. I have "hours" , "days" and "temperature" where "hours" are given in decimal form, "days" are the days of the week where each observation was colected and "temperature" is the temperature that a user of air conditioning inputed in the device. I have simplified the problem but the thing is I want to predict the temperature that is going to be choose having the time (hour and day of the week). I try to do something like this: hour <- c(12,12.5,12.75,13,14,14.5,16,10,11,14,15.71,13,9,10,12,13,18,20,12.2,13) day <- c("m","m","t","t","w","w","th","th","f","f","st","st","sn","sn","m","t","w","th","f","st") temperature <- c(19,20,21,22,20,23,26,27,26,26,25,23,23,20,24,25,25,22,28,26) df <- data.frame(hour,day,temperature) inTrain <- createDataPartition(y=df$temperature, p=0.6,list=F) training <- df[inTrain,] testing <- df[-inTrain,] modelFit <- train(temperature ~ hour+day,data=training, method="glm") modelFit predictions <- predict(modelFit, newdata=testing) but the predictions have decimals, so I don't know how to treate the temperature variable (because it is only going to be a natural value). Which model should I use to predict those data? Do you have any advice or manual that I could check?? Also, I would like to know the correct way of testing the model (usually if I had just two categories I would use a confusionMatrix but here i dont have any clue). Thank you very very much!! -- Aurora González Vidal @. aurora.gonzal...@um.es T. 868 88 7866 www.um.es/ae [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear R users, I am fronting my firts time series problem. I have hourly temperature data for 3 years (from 01/01/2013 to 5/02/2016). I would like to use those in order to PREDICT TEMPERATURE OF THE NEXT HOURS according to the observations. A subset of the data look like this: date <- rep(seq(as.Date("14-01-01"), as.Date("14-01-03"), by="days"), 24) hour <-rep(c(paste0("0",0:9,":00:00"), paste0(10:23,":00:00")),3) temperature <- c(6.1, 6.8, 6.5, 7.2, 7.1, 7.9, 5.9, 6.8, 7.7, 9.5, 12.6, 14.0, 15.9, 17.3, 17.5, 17.2, 15.0, 14.1, 13.1, 11.7, 10.9, 11.0, 11.6, 11.0, 11.2, 11.0, 11.0, 11.4, 12.2, 13.7, 12.9, 12.9, 12.8, 13.4, 13.9, 14.9, 16.6, 16.0, 15.2, 15.4, 14.7, 14.6, 13.3, 13.0, 13.8, 13.1, 12.0, 11.9, 11.8, 11.6, 11.0, 11.2, 11.6, 10.6, 9.5, 9.8, 9.9, 11.7, 15.3, 18.6, 20.7, 22.2, 22.2, 20.8, 20.2, 18.3, 15.6, 13.6, 12.8, 13.1, 13.7, 14.7) dfExample <- data.frame(date, hour, temperature) So as to plot 3 years ( from 01/01/2013 to 31/12/2015) I use this code and obtained the attached picture. It is observed seasonality. tempdf4 <- ts(df4$temperature, frequency=365*24*3) plot.ts(tempdf4) Am I doing it well? Could you help me with any information in this type of problem (mainly with the prediction). For example, if I want to use Arima, according with my data structure, what are the arguments of the funcion?? fit=Arima(df4$temperature, seasonal=list(order=c(xxx,xxx,xxx),period=xxx) plot(forecast(fit)) I could use also some predictions from other source that I am collecting since January, 2016. But I would prefer to understand the simplest way to solve the problem and then, progressively, understand more complex approaches. Thank you very much for any kind of help. -- Aurora González Vidal Phd student in Data Analytics for Energy Efficiency Faculty of Computer Sciences University of Murcia @. aurora.gonzal...@um.es T. 868 88 7866 www.um.es/ae __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you, it works fine. Now, I am trying to evaluate the performance of the model across time. So as to do that I use rolling window which I understand as sort of a "leave one out". The example: The data are from the 1st of January to nowadays so, I use data from the 1st of January to the 1st of December to fit the model and then I predict the temperatures of the 2nd of December. As I have the real ones, I can compute RMSE or other metrics. Then, I use data from 1st of January to the 2nd of December in order to predict the 24 values of temperature on the 3rd of December, and later I compute again the RMSE (between predicted and real of the 3rd). So on untill I have no more data. Then, I have several RMSE, I compute their mean and sd and I consider this as the evaluation of the model's performance. The question is: do you know any book or documentation where I can cosult how many times should I do this process so as to know where I should start. Should I start before December to do the rolling? I mean, is there any agreement? For example, if I have 400 days of data, meaning 9600 (400 * 24) observations maybe I could choose a 10 % of the windows so as to start evaluating, which means, do the process 40 times starting with the day 360. Any source of information will be appreciated. Sean Porter escribió: > Try the auto.arima function in the forecast package.. > > Regards, > > DR SEAN PORTER > Scientist > > South African Association for Marine Biological Research > Direct Tel: +27 (31) 328 8169 Fax: +27 (31) 328 8188 > E-mail: spor...@ori.org.za Web: www.saambr.org.za[1] > 1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa > PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of AURORA > GONZALEZ VIDAL > Sent: 05 February 2016 10:50 AM > To: r-help@r-project.org > Subject: [R] hourly prediction time series > > Dear R users, > > I am fronting my firts time series problem. I have hourly temperature > data for 3 years (from 01/01/2013 to 5/02/2016). I would like to use > those in order to PREDICT TEMPERATURE OF THE NEXT HOURS according to the > observations. > > A subset of the data look like this: > > date <- rep(seq(as.Date("14-01-01"), as.Date("14-01-03"), by="days"), > 24) hour <-rep(c(paste0("0",0:9,":00:00"), paste0(10:23,":00:00")),3) > temperature <- c(6.1, 6.8, 6.5, 7.2, 7.1, 7.9, 5.9, 6.8, 7.7, 9.5, 12.6, > 14.0, 15.9, 17.3, 17.5, 17.2, 15.0, 14.1, 13.1, 11.7, > 10.9, > 11.0, 11.6, 11.0, 11.2, 11.0, 11.0, 11.4, 12.2, 13.7, > 12.9, > 12.9, 12.8, 13.4, 13.9, 14.9, 16.6, 16.0, 15.2, 15.4, > 14.7, > 14.6, 13.3, 13.0, 13.8, 13.1, 12.0, 11.9, 11.8, 11.6, > 11.0, > 11.2, 11.6, 10.6, 9.5, 9.8, 9.9, 11.7, 15.3, 18.6, 20.7, > 22.2, 22.2, 20.8, 20.2, 18.3, 15.6, 13.6, 12.8, 13.1, > 13.7, 14.7) > > dfExample <- data.frame(date, hour, temperature) > > So as to plot 3 years ( from 01/01/2013 to 31/12/2015) I use this code > and obtained the attached picture. It is observed seasonality. > > tempdf4 <- ts(df4$temperature, frequency=365*24*3) > plot.ts(tempdf4) > > Am I doing it well? Could you help me with any information in this type > of problem (mainly with the prediction). For example, if I want to use > Arima, according with my data structure, what are the arguments of the > funcion?? > > fit=Arima(df4$temperature, seasonal=list(order=c(xxx,xxx,xxx),period=xxx) > plot(forecast(fit)) > > I could use also some predictions from other source that I am collecting > since January, 2016. But I would prefer to understand the simplest way > to solve the problem and then, progressively, understand more complex > approaches. > > Thank you very much for any kind of help. > > -- > Aurora González Vidal > Phd student in Data Analytics for Energy Efficiency > > Faculty of Computer Sciences > University of Murcia > > @. aurora.gonzal...@um.es > T. 868 88 7866www.um.es/ae[2] Vínculos: - [1] http://www.saambr.org.za [2] http://7866www.um.es/ae -- Aurora González Vidal Phd student in Data Analytics for Energy Efficiency Faculty of Computer Sciences University of Murcia @. aurora.gonzal...@um.es T. 868 88 7866 www.um.es/ae [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.