Re: [R] spped up a function

PIKAL Petr Tue, 09 Jul 2013 23:39:37 -0700

Hi Santiago

Keep conversation in list. Others can have better ideas.

I am still messing the reasoning

Merge seems to me the solution but I am lost in your resoning what to keep and 
what to discard from resulting object.

After merge I have this

result <- structure(list(Ring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("6106933", "6134701", "6140497", "6140719", "6140756",
"6140855", "6143070", "6143090", "6143093", "6175711", "6175726",
"6175730", "6175769", "6175776", "6175784", "6188609", "6188705",
"6195159", "6195171", "6198153", "6198154", "6198156", "6198157",
"6198172"), class = "factor"), jul = c(15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135), timepos = structure(c(1307680575, 1307680740,
1307681040, 1307681340, 1307681640, 1307681940, 1307682240, 1307682540,
1307682780, 1307683080, 1307683380, 1307683680, 1307683980, 1307684280,
1307684397, 1307684424, 1307684484, 1307684490, 1307684580, 1307684880,
1307685180, 1307685243, 1307685321, 1307685336), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), act = c(3822L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 27L, 60L, 6L, 753L, NA, NA, NA,
78L, 15L, 18L), wd = c("dry", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, "wet", "dry", "wet", "dry", NA, NA, NA, "wet",
"dry", "wet")), .Names = c("Ring", "jul", "timepos", "act", "wd"
), row.names = c(NA, -24L), class = "data.frame")

> result
      Ring   jul             timepos  act   wd
1  6106933 15135 2011-06-10 04:36:15 3822  dry
2  6106933 15135 2011-06-10 04:39:00   NA <NA>
3  6106933 15135 2011-06-10 04:44:00   NA <NA>
4  6106933 15135 2011-06-10 04:49:00   NA <NA>
5  6106933 15135 2011-06-10 04:54:00   NA <NA>
6  6106933 15135 2011-06-10 04:59:00   NA <NA>
7  6106933 15135 2011-06-10 05:04:00   NA <NA>
8  6106933 15135 2011-06-10 05:09:00   NA <NA>
9  6106933 15135 2011-06-10 05:13:00   NA <NA>
10 6106933 15135 2011-06-10 05:18:00   NA <NA>
11 6106933 15135 2011-06-10 05:23:00   NA <NA>
12 6106933 15135 2011-06-10 05:28:00   NA <NA>
13 6106933 15135 2011-06-10 05:33:00   NA <NA>
14 6106933 15135 2011-06-10 05:38:00   NA <NA>
15 6106933 15135 2011-06-10 05:39:57   27  wet
16 6106933 15135 2011-06-10 05:40:24   60  dry
17 6106933 15135 2011-06-10 05:41:24    6  wet
18 6106933 15135 2011-06-10 05:41:30  753  dry
19 6106933 15135 2011-06-10 05:43:00   NA <NA>
20 6106933 15135 2011-06-10 05:48:00   NA <NA>
21 6106933 15135 2011-06-10 05:53:00   NA <NA>
22 6106933 15135 2011-06-10 05:54:03   78  wet
23 6106933 15135 2011-06-10 05:55:21   15  dry
24 6106933 15135 2011-06-10 05:55:36   18  wet

I understand you want to keep only time values from GPL data.frame. OK this can 
be done in the last step. But I am a bit lost in the logic for discarding lines 
15-18. Anyway, this can be what you want

library(zoo)
result$wd<-na.locf(result$wd)
final<-result[is.na(result$act),]
> final
      Ring   jul             timepos act  wd
2  6106933 15135 2011-06-10 04:39:00  NA dry
3  6106933 15135 2011-06-10 04:44:00  NA dry
4  6106933 15135 2011-06-10 04:49:00  NA dry
5  6106933 15135 2011-06-10 04:54:00  NA dry
6  6106933 15135 2011-06-10 04:59:00  NA dry
7  6106933 15135 2011-06-10 05:04:00  NA dry
8  6106933 15135 2011-06-10 05:09:00  NA dry
9  6106933 15135 2011-06-10 05:13:00  NA dry
10 6106933 15135 2011-06-10 05:18:00  NA dry
11 6106933 15135 2011-06-10 05:23:00  NA dry
12 6106933 15135 2011-06-10 05:28:00  NA dry
13 6106933 15135 2011-06-10 05:33:00  NA dry
14 6106933 15135 2011-06-10 05:38:00  NA dry
19 6106933 15135 2011-06-10 05:43:00  NA dry
20 6106933 15135 2011-06-10 05:48:00  NA dry
21 6106933 15135 2011-06-10 05:53:00  NA dry
>

Regards
Petr

From: Santiago Guallar [mailto:sgual...@yahoo.com]
Sent: Tuesday, July 09, 2013 10:02 PM
To: PIKAL Petr
Subject: Re: [R] spped up a function

Dear Petr,

I wanted the two data sets merged in such a way that the values of the 'wd' 
vector (from the intervals t of 'xact') are assigned to the corresponding 
intervals of 'GPS'. If there is more than one value (i.e if there is more than 
one interval of 'xact' for the corresponding interval of 'GPS'), then take the 
maximum (i.e. the value of the interval of 'xact' closest to the corresponding 
interval of 'GPS'). This is why the output of the particular sequence of the 
result I copied in the previous message contains only 'dry'.

Santi


From: PIKAL Petr <petr.pi...@precheza.cz<mailto:petr.pi...@precheza.cz>>
To: Santiago Guallar <sgual...@yahoo.com<mailto:sgual...@yahoo.com>>; r-help 
<r-help@r-project.org<mailto:r-help@r-project.org>>
Sent: Tuesday, July 9, 2013 11:19 AM
Subject: RE: [R] spped up a function

Hi Santiago

I am a bit confused how is your result organised, why there are only âdryâ 
value regardless of timepos values.

It is not necessary to attach files resulting from dput. Just copy it to your 
mail and anybody can copy it directly to R

Ring is factor in xact but numeric in GPS
> str(xact)
'data.frame':   8 obs. of  5 variables:
$ Ring   : Factor w/ 24 levels "6106933","6134701",..: 1 1 1 1 1 1 1 1
$ jul    : num  15135 15135 15135 15135 15135 ...
$ timepos: POSIXct, format: "2011-06-10 04:36:15" "2011-06-10 05:39:57" ...
$ act    : int  3822 27 60 6 753 78 15 18
$ wd     : chr  "dry" "wet" "dry" "wet" ...
> str(GPS)
'data.frame':   16 obs. of  3 variables:
$ Ring   : int  6106933 6106933 6106933 6106933 6106933 6106933 6106933 6106933 
6106933 6106933 ...
$ jul    : num  15135 15135 15135 15135 15135 ...
$ timepos: POSIXct, format: "2011-06-10 04:39:00" "2011-06-10 04:44:00" ...

So I first changed it to factor in both.

GPS$Ring<-factor(GPS$Ring)

after that I merged both files

result<-merge(xact, GPS, all=T)

and here is result

dput(result)
structure(list(Ring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("6106933", "6134701", "6140497", "6140719", "6140756",
"6140855", "6143070", "6143090", "6143093", "6175711", "6175726",
"6175730", "6175769", "6175776", "6175784", "6188609", "6188705",
"6195159", "6195171", "6198153", "6198154", "6198156", "6198157",
"6198172"), class = "factor"), jul = c(15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135, 15135,
15135, 15135), timepos = structure(c(1307680575, 1307680740,
1307681040, 1307681340, 1307681640, 1307681940, 1307682240, 1307682540,
1307682780, 1307683080, 1307683380, 1307683680, 1307683980, 1307684280,
1307684397, 1307684424, 1307684484, 1307684490, 1307684580, 1307684880,
1307685180, 1307685243, 1307685321, 1307685336), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), act = c(3822L, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 27L, 60L, 6L, 753L, NA, NA, NA,
78L, 15L, 18L), wd = c("dry", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, "wet", "dry", "wet", "dry", NA, NA, NA, "wet",
"dry", "wet")), .Names = c("Ring", "jul", "timepos", "act", "wd"
), row.names = c(NA, -24L), class = "data.frame")

there are empty values in act and wd column. You can fill it eg. by 
âna.locfâ function from âzooâ package.

> result$wd
[1] "dry" NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
[13] NA    NA    "wet" "dry" "wet" "dry" NA    NA    NA    "wet" "dry" "wet"
> na.locf(result$wd)
[1] "dry" "dry" "dry" "dry" "dry" "dry" "dry" "dry" "dry" "dry" "dry" "dry"
[13] "dry" "dry" "wet" "dry" "wet" "dry" "dry" "dry" "dry" "wet" "dry" "wet"
>

Is this what you want?

Regards
Petr


From: Santiago Guallar [mailto:sgual...@yahoo.com]
Sent: Tuesday, July 09, 2013 8:53 AM
To: PIKAL Petr; r-help
Subject: Re: [R] spped up a function

Hi Petr, yes the function basically consists on merging two time series with 
different time intervals: one regular 'GPS' and one irregular 'xact' (the 
latter containing the binomial variable 'wd' that I want to add to 'GPS'.
Apparently my attachments did not go through. Here you have the dputs you 
requested plus the desired result based on them:

head(xact)
Ring     jul   timepos        act   wd
6106933 15135 2011-06-10 04:36:15  3822 dry
6106933 15135 2011-06-10 05:39:57    27 wet
6106933 15135 2011-06-10 05:40:24    60 dry
6106933 15135 2011-06-10 05:41:24     6 wet
6106933 15135 2011-06-10 05:41:30   753 dry
6106933 15135 2011-06-10 05:54:03    78 wet
6106933 15135 2011-06-10 05:55:21    15 dry
6106933 15135 2011-06-10 05:55:36    18 wet

head(GPS1, 16) and desired result (added column wd)
      Ring   jul             timepos wd
5  6106933 15135 2011-06-10 04:39:00 dry
6  6106933 15135 2011-06-10 04:44:00 dry
7  6106933 15135 2011-06-10 04:49:00 dry
8  6106933 15135 2011-06-10 04:54:00 dry
9  6106933 15135 2011-06-10 04:59:00 dry
10 6106933 15135 2011-06-10 05:04:00 dry
11 6106933 15135 2011-06-10 05:09:00 dry
12 6106933 15135 2011-06-10 05:13:00 dry
13 6106933 15135 2011-06-10 05:18:00 dry
14 6106933 15135 2011-06-10 05:23:00 dry
15 6106933 15135 2011-06-10 05:28:00 dry
16 6106933 15135 2011-06-10 05:33:00 dry
17 6106933 15135 2011-06-10 05:38:00 dry
18 6106933 15135 2011-06-10 05:43:00 dry
19 6106933 15135 2011-06-10 05:48:00 dry
20 6106933 15135 2011-06-10 05:53:00 dry

Santi
________________________________
From: PIKAL Petr <petr.pi...@precheza.cz<mailto:petr.pi...@precheza.cz>>
To: Santiago Guallar <sgual...@yahoo.com<mailto:sgual...@yahoo.com>>; r-help 
<r-help@r-project.org<mailto:r-help@r-project.org>>
Sent: Monday, July 8, 2013 11:34 AM
Subject: RE: [R] spped up a function

Hi

It seems to me, that you basically want merge, but I can miss the point. Try 
post

dput(head(xact))
dput(head(GPS))

and what shall be desired result based on those 2 datasets.

Regards
Petr


> -----Original Message-----
> From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> 
> [mailto:r-help-bounces@r-
> project.org<http://project.org/>] On Behalf Of Santiago Guallar
> Sent: Tuesday, July 02, 2013 7:47 PM
> To: r-help
> Subject: [R] spped up a function
>
> Hi,
>
> I have written a function to assign the values of a certain variable
> 'wd' from a dataset to another dataset. Both contain data from the
> same time period but differ in the length of their time intervals:
> 'GPS' has regular 10-minute intervals whereas 'xact' has irregular
> intervals. I attached simplified text versions from write.table. You
> can also get a dput of 'xact' in this address:
> http://www.megafileupload.com/en/file/431569/xact-dput.html).
> The original objects are large and the function takes almost one hour
> to finish.
> Here's the function:
>
> fxG= function(xact, GPS){
> l <- rep( 'A', nrow(GPS) )
> v <- unique(GPS$Ring) # the process is carried out for several
> individuals identified by 'Ring'
> for(k in 1:length(v) ){
> I = v[k]
> df <- xact[xact$Ring == I,]
> for(i in 1:nrow(GPS)){
> if(GPS[i,]$Ring== I){# the code runs along the whole data.frame for
> each i; it'd save time to make it stop with the last record of each i
> instead u <- df$timepos <= GPS[i,]$timepos # fill vector l for each
> interval t from xact <= each interval from GPS (take the max if there's
> > 1 interval) l[i] <- df[max( which(u == TRUE) ),]$wd } } } return(l)}
>
> vwd <- fxG(xact, GPS)
>
>
> My question is: how can I speed up (optimize) this function?
>
> Thank you for your help


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] spped up a function

Reply via email to