from:"aurora . gonzalez"

[R] lme4 package installation

2015-08-13 Thread aurora . gonzalez


Hello

I've downloaded the tar.gz file of the package "lme4" and when I use the 
coomand:


install.packages("lme4_1.1-8.tar.gz", repos = NULL, type = "source")

appears an error that suspends the installation:


In file included from external.cpp:8:0:
predModule.h:12:23: fatal error: RcppEigen.h: No such file or directory
compilation terminated.
make: *** [external.o] Error 1
ERROR: compilation failed for package ‘lme4’
* removing ‘/home/aurora/R/x86_64-pc-linux-gnu-library/3.2/lme4’



Does anyone know how to fix it? Thank you very much!


My sessionInfo:


R version 3.2.1 (2015-06-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
[7] base

loaded via a namespace (and not attached):
[1] tools_3.2.1

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GA package integer hyperparameters optimization

2017-04-10 Thread AURORA GONZALEZ VIDAL

Hello everybody.

I am  using the GA package[1] in order to optimize the hyperparameter of
SVM like in this example is done:
http://stackoverflow.com/questions/32026436/how-to-optimize-parameters-using-genetic-algorithms

However, when I try to adapt the example for random forest, it takes very
very long to optimize. It might be because the hyperparameter of random
forest are integers (ntree, mtry, nodes) but I don't know if there is a way
to specify it in the algorithm. Any suggestion would be very much
appreciated. Thank you!

The code:

library(GA)
library("randomForest")

data(Ozone, package="mlbench")
Data <- na.omit(Ozone)

# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
  train_data = Data[fold_inds != i, , drop = FALSE],
  test_data = Data[fold_inds == i, , drop = FALSE]))

# Given the values of parameters 'ntree', 'mtry' and 'nodesize', return the
rmse of the model over the test data
evalParamsRF <- function(train_data, test_data, ntree, mtry, nodesize) {
  # Train
  model <- randomForest(V4 ~ ., data = train_data, ntree = ntree, mtry =
mtry, nodesize = nodesize
    , proximity=T)
  # Test
  rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
  return (rmse)
}

fitnessFuncRF <- function(x, Lst_CV_Data) {
  # Retrieve the RF parameters
  ntree_val <- x[1]
  mtry_val <- x[2]
  nodesize_val <- x[3]
 
  # Use cross-validation to estimate the RMSE for each split of the
dataset
  rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data,
 
evalParamsRF(train_data, test_data, ntree_val
  
, mtry_val, nodesize_val)))
 
  # As fitness measure, return minus the average rmse (over the
cross-validation folds),
  # so that by maximizing fitness we are minimizing the rmse
  return (-mean(rmse_vals))
}

theta_min <- c(ntree = 100, mtry = 2, nodesize = 3)
theta_max <- c(ntree = 1000, mtry = 7, nodesize = 20)

# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFuncRF, lst_CV_data,
  names = names(theta_min),
  min = theta_min, max = theta_max,
  popSize = 50, maxiter = 10)

summary(results)
summary(results)$solution



Links:
--
[1] https://cran.r-project.org/web/packages/GA/index.html


--
Aurora González Vidal
Ph.D. student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
sae.saiblogs.inf.um.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dissimilarity matrix using SAX distance

2016-08-16 Thread AURORA GONZALEZ VIDAL

Dear all,

I'm trying to cluster some data using SAX distance that was described in
the paper "a symbolic representation of time series with implications for
streaming algorithms" http://www.cs.ucr.edu/~eamonn/SAX.pdf

Once I have my data in matrix format, which function can I use to compute
the dissimilarity matrix? There are several ones to compute the distance
between two SAX data series

diss.MINDIST.SAX(x, y, w, alpha, plot=TRUE)
Func.dist(x, y, matrix, n)

but it is very slow when I try to fill the matrix with two loops and I
really think there should be already any implentation. Do you have any
idea?

I already convert the data into a series of "a",  "b", "c", ... etc data
so I would appreciate either the directo computation of the sax matrix
using my raw data OR using the data already converted to SAX format.

Thank you for any suggestion!


--
Aurora González Vidal
Phd student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optimize the filling of a diagonal matrix (two for loops)

2016-08-18 Thread AURORA GONZALEZ VIDAL

Hello

I have two for loops that I am trying to optimize... I looked for
vectorization or for using some funcions of the apply family  but really
cannot do it. I am writting my code with some small data set. With this
size there is no problem but sometimes I will have hundreds of rows so it
is really important to optimize the code. Any suggestion will be very
welcomed.

library("TSMining")
dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T),
   V2 = sample(c(1,2,3,4),30,replace =
T),
   V3 = sample(c(1,2,3,4),30,replace =
T),
   V4 = sample(c(1,2,3,4),30,replace =
T))
saxM = Func.matrix(5)
colnames(saxM) = 1:5
rownames(saxM) = 1:5
matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))

FOR(I IN 1:(NROW(DATAS)-1)){
  FOR(J IN (1+I):NROW(DATAS)){
    MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]),
AS.CHARACTER(DATAS[J,]), SAXM, N=60)
  }
}
matrixPrepared

Thank you!


--
Aurora González Vidal
Phd student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] geom_text in ggplot (position)

2015-05-12 Thread AURORA GONZALEZ VIDAL

Hello everybody.

I have an "esthetic" question. I have managed to create a stacked and
grouped bar plot but I don't manage with putting the text in the middle of
the bar plots. Do you know how to write the numbers in that position?

Thank you so much.

Example code:

test  <- data.frame(variables =  c("PE_35", "PE_49"),
    value1=c(13,3),
    value2=c(75,31),
    value3=c(7,17),
    value4 =c(5,49))

library(reshape2) # for melt

melted <- melt(test, "variables")
melted$cO <- c("A","A","B","B","A","A","B","B")

melted$cat <- ''
melted[melted$variable == 'value1' | melted$variable == 'value2',]$cat <-
"0"
melted[melted$variable == 'value3' | melted$variable == 'value4',]$cat <-
"1"

names(melted)[3] <- "recuento"

library(ggplot2)

ggplot(melted, aes(x = cat, y = recuento,ymax=max(recuento)*1.05, fill =
cO)) +
  geom_bar(stat = 'identity', position = 'stack', col="black") +
facet_grid(~ variables)+
  geom_text(aes(label = recuento), size = 5, hjust = 0.5, vjust = 1,
position ="stack")


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rmarkdown / knitr naming the output file

2015-07-06 Thread AURORA GONZALEZ VIDAL

Hello.
I have a question for Rmarkdown users.

Is there any way to give a name to the output document inside the Rmd?

For example, my rmd's name is "bb.Rmd" but when I knitr to pdf I want it to
name the pdf differently than "bb.pdf", for example, "doc1.pdf". Is there
any way to do this?

Thank you very much


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rgl 3d surface

2015-07-15 Thread AURORA GONZALEZ VIDAL

Hello.

I am trying to plot a 3d surface given its equation. The R code is written
in blue.
So, let's say that I have the points x,y,z and I plot them. Also, I compute
its regression surface doing polynomical regression (fit)

library('rgl')
x <- c(-32.09652, -28.79491, -25.48977, -23.18746,-20.88934, -18.58220,
-17.27919)
y <- c(-32.096, -28.794, -25.489, -23.187,-20.889, -18.582, -17.279)
z <- c(12.16344, 28.84962, 22.36605, 20.13733, 79.50248, 65.46150,44.52274)
plot3d(x,y,z, type="s", col="red", size=1)

fit <- lm(z ~ poly(x,2) + poly(y,2))

In this way, I obtain the coefficients of the surface

coef(fit)

  (Intercept)   poly(x, 2)1   poly(x, 2)2
 3.900045e+01  1.763363e+06  6.683531e+05
  poly(y, 2)1   poly(y, 2)2
-1.763303e+06 -6.683944e+05

So I want to repressent the surface
3.900045e+01 +1.763363e+06*x + 6.683531e+05*x*x
-1.763303e+06*y-6.683944e+05*y*y

How could I do it? Any idea??

Thank you very much!


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] graphviz, Rmarkdown, colorBrewer

2015-07-26 Thread AURORA GONZALEZ VIDAL

Hello. I am drawing a graph using graphviz. It works but now, I am trying
to use some palettes from the RColorBrewer pakcage. Any idea why this
diagram works when the code (in .Rmd) is

```{r, engine='dot', echo=F}
digraph unix{
  size=30;
  ratio=compress;
 
  param [label="  Contrastes paramétricos  ", shape=oval,
   style="filled,rounded,diagonals",
fillcolor=dodgerblue3,
   fontcolor=gray90];
 
```

but it doesn't work if I try to use some colors of any palette

```{r, echo=FALSE}
library("RColorBrewer")
colores <- brewer.pal(11,"PiYG")
```

```{r, engine='dot', echo=F}
digraph unix{
  size=30;
  ratio=compress;
 
  param [label="  Contrastes paramétricos  ", shape=oval,
   style="filled,rounded,diagonals",
fillcolor=colores[1],
   fontcolor=gray90];
 
```

Thank you very much!!


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] compare grupos dichotomus dependent variable

2015-08-07 Thread AURORA GONZALEZ VIDAL

Hello everybody. I have a statistics question:

let's say that I want to compaire answers between men and women to a yes/no
question but I have so much more women than men, then, it looks like I
cannot use chi squared test. Would it be correct to use U test (or ranked
Wilcoxon test)?? What do you think?? The code is below, than you so much!!

men<-rep( 0,12 )
women <- c( 0,1,0,0,0,1,0,0,0,rep( 0,114 ),1,rep( 0,199 ) )
wilcox.test( men, women )
chisq.test( men, women )


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rstudio and GIT

2015-01-14 Thread AURORA GONZALEZ VIDAL

Dear R users, I have a  very specific question.

I want to know how to create a local git repository from an exisitng file
(with some documents inside) just like we do when typing

git init

but from Rstudio.

I tried selecting FIle-->New Project-->Existing Directory--> and I select
the file but I am not sure about what I should do.
Thank you very much for all your advices.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] xtable caption knitr

2015-02-24 Thread AURORA GONZALEZ VIDAL

Dear all,
I have a problem with the caption option on the xtable function.

Using Rmarkdown, knitr generates correctly a pdf when I write something
like this:

```{r xtable, results="asis"}
library( xtable )
variableName  <- c( "V03_1" )
age <- c( rep(1,10),rep(2,10),rep(3,10) )
gender <- c( rep("m",15), rep("f",15) )

df <- data.frame( age, gender )

t <- xtable( df, caption = "hello" )     
print( t, caption.placement = 'top',comment = FALSE )  
```

But if I change to

t <- xtable(df, caption = variableName)   

wich is what I really want it retuns a pandoc error:

! Missing $ inserted.

    $
l.112 \caption{V03_1}

pandoc: Error producing PDF from TeX source
Error: pandoc document conversion failed with error 43

I don't know why because variableName is also a character variable!

Any idea? Thank you very much!


--
Aurora González Vidal

Sección Apoyo Estadístico.
Servicio de Apoyo a la Investigación (SAI).
Vicerrectorado de Investigación.
Universidad de Murcia
Edif. SACE . Campus de Espinardo.
30100 Murcia

@. aurora.gonzal...@um.es
T. 868 88 7315
F. 868 88 7302
www.um.es/sai
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predictions several categories

2015-10-12 Thread AURORA GONZALEZ VIDAL

Hello everybody.

I am using the caret package in order to predict something from some data.
I have "hours" , "days" and "temperature" where "hours" are given in
decimal form, "days" are the days of the week where each observation was
colected and "temperature" is the temperature that a user of air
conditioning inputed in the device.

I have simplified the problem but the thing is I want to predict the
temperature that is going to be choose having the time (hour and day of the
week).

I try to do something like this:

hour <-
c(12,12.5,12.75,13,14,14.5,16,10,11,14,15.71,13,9,10,12,13,18,20,12.2,13)
day <-
c("m","m","t","t","w","w","th","th","f","f","st","st","sn","sn","m","t","w","th","f","st")
temperature <-
c(19,20,21,22,20,23,26,27,26,26,25,23,23,20,24,25,25,22,28,26)
df <- data.frame(hour,day,temperature)

inTrain <- createDataPartition(y=df$temperature, p=0.6,list=F)
training <- df[inTrain,]
testing <- df[-inTrain,]

modelFit <- train(temperature ~ hour+day,data=training, method="glm")
modelFit
predictions <- predict(modelFit, newdata=testing)

but the predictions have decimals, so I don't know how to treate the
temperature variable (because it is only going to be a natural value).
Which model should I use to predict those data? Do you have any advice or
manual that I could check??

Also, I would like to know the correct way of testing the model (usually if
I had just two categories I would use a confusionMatrix but here i dont
have any clue).

Thank you very very much!!


--
Aurora González Vidal

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] hourly prediction time series

2016-02-05 Thread AURORA GONZALEZ VIDAL


Dear R users,

I am fronting my firts time series problem. I have hourly temperature data
for 3 years (from 01/01/2013 to 5/02/2016). I would like to use those in
order to PREDICT TEMPERATURE OF THE NEXT HOURS according to the
observations.

A subset of the data look like this:

date <- rep(seq(as.Date("14-01-01"), as.Date("14-01-03"), by="days"), 24)
hour <-rep(c(paste0("0",0:9,":00:00"), paste0(10:23,":00:00")),3)
temperature <- c(6.1, 6.8, 6.5, 7.2, 7.1, 7.9, 5.9, 6.8, 7.7, 9.5, 12.6,
 14.0, 15.9, 17.3, 17.5, 17.2, 15.0, 14.1,
13.1, 11.7, 10.9,
 11.0, 11.6, 11.0, 11.2, 11.0, 11.0, 11.4,
12.2, 13.7, 12.9,
 12.9, 12.8, 13.4, 13.9, 14.9, 16.6, 16.0,
15.2, 15.4, 14.7,
 14.6, 13.3, 13.0, 13.8, 13.1, 12.0, 11.9,
11.8, 11.6, 11.0,
 11.2, 11.6, 10.6, 9.5, 9.8, 9.9, 11.7,
15.3, 18.6, 20.7,
 22.2, 22.2, 20.8, 20.2, 18.3, 15.6, 13.6,
12.8, 13.1, 13.7, 14.7)

dfExample <- data.frame(date, hour, temperature) 

So as to plot 3 years ( from 01/01/2013 to 31/12/2015) I use this code and
obtained the attached picture. It is observed seasonality.

tempdf4 <- ts(df4$temperature, frequency=365*24*3)
plot.ts(tempdf4)

Am I doing it well? Could you help me with any information in this type of
problem (mainly with the prediction). For example, if I want to use Arima,
according with my data structure, what are the arguments of the funcion??

fit=Arima(df4$temperature, seasonal=list(order=c(xxx,xxx,xxx),period=xxx)
plot(forecast(fit))

I could use also some predictions from other source that I am collecting
since January, 2016. But I would prefer to understand the simplest way to
solve the problem and then, progressively, understand more complex
approaches.

Thank you very much for any kind of help.


--
Aurora González Vidal
Phd student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] evaluation + Re: hourly prediction time series

2016-02-07 Thread AURORA GONZALEZ VIDAL

  Thank you, it works fine.

Now, I am trying to evaluate the performance of the model across time. So
as to do that I use rolling window which I understand as sort of a "leave
one out".

The example:

The data are from the 1st of January to nowadays so, I use data from the
1st of January to the 1st of December to fit the model and then I predict
the temperatures of the 2nd of December. As I have the real ones, I can
compute RMSE or other metrics.
Then, I use data from 1st of January to the 2nd of December in order to
predict the 24 values of temperature on the 3rd of December, and later I
compute again the RMSE (between predicted and real of the 3rd).
So on untill I have no more data.
Then, I have several RMSE, I compute their mean and sd and I consider this
as the evaluation of the model's performance.

The question is: do you know any book or documentation where I can cosult
how many times should I do this process so as to know where I should start.
Should I start before December to do the rolling? I mean, is there any
agreement? For example, if I have 400 days of data, meaning 9600 (400 * 24)
observations maybe I could choose a 10 % of the windows so as to start
evaluating, which means, do the process 40 times starting with the day 360.

Any source of information will be appreciated.

Sean Porter  escribió:

> Try the auto.arima function in the forecast package..
>
> Regards,
>
> DR SEAN PORTER
> Scientist
>
> South African Association for Marine Biological Research
> Direct Tel: +27 (31) 328 8169   Fax: +27 (31) 328 8188
> E-mail: spor...@ori.org.za Web: www.saambr.org.za[1]
> 1 King Shaka Avenue, Point, Durban 4001 KwaZulu-Natal South Africa
> PO Box 10712, Marine Parade 4056 KwaZulu-Natal South Africa
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of AURORA
> GONZALEZ VIDAL
> Sent: 05 February 2016 10:50 AM
> To: r-help@r-project.org
> Subject: [R] hourly prediction time series
>
> Dear R users,
>
> I am fronting my firts time series problem. I have hourly temperature
> data for 3 years (from 01/01/2013 to 5/02/2016). I would like to use
> those in order to PREDICT TEMPERATURE OF THE NEXT HOURS according to the
> observations.
>
> A subset of the data look like this:
>
> date <- rep(seq(as.Date("14-01-01"), as.Date("14-01-03"), by="days"),
> 24) hour <-rep(c(paste0("0",0:9,":00:00"), paste0(10:23,":00:00")),3)
> temperature <- c(6.1, 6.8, 6.5, 7.2, 7.1, 7.9, 5.9, 6.8, 7.7, 9.5, 12.6,
>                 14.0, 15.9, 17.3, 17.5, 17.2, 15.0, 14.1, 13.1,
11.7,
> 10.9,
>                 11.0, 11.6, 11.0, 11.2, 11.0, 11.0, 11.4, 12.2,
13.7,
> 12.9,
>                 12.9, 12.8, 13.4, 13.9, 14.9, 16.6, 16.0, 15.2,
15.4,
> 14.7,
>                 14.6, 13.3, 13.0, 13.8, 13.1, 12.0, 11.9, 11.8,
11.6,
> 11.0,
>                 11.2, 11.6, 10.6, 9.5, 9.8, 9.9, 11.7, 15.3,
18.6, 20.7,
>                 22.2, 22.2, 20.8, 20.2, 18.3, 15.6, 13.6, 12.8,
13.1,
> 13.7, 14.7)
>
> dfExample <- data.frame(date, hour, temperature)
>
> So as to plot 3 years ( from 01/01/2013 to 31/12/2015) I use this code
> and obtained the attached picture. It is observed seasonality.
>
> tempdf4 <- ts(df4$temperature, frequency=365*24*3)
> plot.ts(tempdf4)
>
> Am I doing it well? Could you help me with any information in this type
> of problem (mainly with the prediction). For example, if I want to use
> Arima, according with my data structure, what are the arguments of the
> funcion??
>
> fit=Arima(df4$temperature, seasonal=list(order=c(xxx,xxx,xxx),period=xxx)
> plot(forecast(fit))
>
> I could use also some predictions from other source that I am collecting
> since January, 2016. But I would prefer to understand the simplest way
> to solve the problem and then, progressively, understand more complex
> approaches.
>
> Thank you very much for any kind of help.
>
> --
> Aurora González Vidal
> Phd student in Data Analytics for Energy Efficiency
>
> Faculty of Computer Sciences
> University of Murcia
>
> @. aurora.gonzal...@um.es
> T. 868 88 7866www.um.es/ae[2]



Vínculos:
-
[1] http://www.saambr.org.za
[2] http://7866www.um.es/ae


--
Aurora González Vidal
Phd student in Data Analytics for Energy Efficiency

Faculty of Computer Sciences
University of Murcia

@. aurora.gonzal...@um.es
T. 868 88 7866
www.um.es/ae

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lme4 package installation

[R] GA package integer hyperparameters optimization

[R] dissimilarity matrix using SAX distance

[R] optimize the filling of a diagonal matrix (two for loops)

[R] geom_text in ggplot (position)

[R] Rmarkdown / knitr naming the output file

[R] rgl 3d surface

[R] graphviz, Rmarkdown, colorBrewer

[R] compare grupos dichotomus dependent variable

[R] Rstudio and GIT

[R] xtable caption knitr

[R] predictions several categories

[R] hourly prediction time series

[R] evaluation + Re: hourly prediction time series

14 matches

Site Navigation

Mail list logo

Footer information