Sapply is not significantly faster than a for loop. Vectorization generally is, but you pay for it in RAM.

> dat.oc <- oc[dat$Class,]
> dat$Flag <- ifelse(with(dat.oc,(Open<=dat$Close_date & dat$Close_date<=Close) | (Open1<=dat$Close_date & dat$Close_date<=Close1)),"Valid","Invalid")

If you are really hurting for RAM, you might take a rather less computationally efficient approach:

> dat$Flag <- sapply(1:length(dat$Class), function(idx,ddat,toc){cl <- ddat[idx,"Class"]
cld <- ddat[idx,"Close_date"]
if ( (toc[cl,"Open"]<=cld && cld<=toc[cl,"Close"]) || (toc[cl,"Open1"]<=cld && cld<=toc[cl,"Close1"])) {result <- "Valid"} else {result <- "Invalid"}
c(Flag=result) }, ddat=dat, toc=oc )


Steven Kang wrote:
ini <- as.Date("2010/1/1", "%Y/%m/%d")
# Generate arbitrary data frame consisting of date values
oc <- data.frame(Open = seq(ini, ini + 6, 1), Close = seq(ini + 365, ini +
365 + 6, 1), Open1 = seq(ini + 365*2, ini + 365*2 + 6, 1), Close1 = seq(ini
+ 365*3, ini + 365*3 + 6, 1), Open2 = seq(ini + 365*4, ini + 365*4 + 6, 1),
Close2 = seq(ini + 365*5, ini + 365*5 + 6, 1))
rownames(oc) <- c("AAA", "C", "AA", "A", "CC", "BB", "B")

oc
          Open          Close          Open1        Close1
Open2        Close2
AAA  2010-01-01  2011-01-01  2012-01-01  2012-12-31  2013-12-31  2014-12-31
C      2010-01-02  2011-01-02  2012-01-02  2013-01-01  2014-01-01
2015-01-01
AA    2010-01-03  2011-01-03  2012-01-03  2013-01-02  2014-01-02  2015-01-02
A      2010-01-04  2011-01-04  2012-01-04  2013-01-03  2014-01-03
2015-01-03
CC    2010-01-05  2011-01-05  2012-01-05  2013-01-04  2014-01-04  2015-01-04
BB    2010-01-06  2011-01-06  2012-01-06  2013-01-05  2014-01-05  2015-01-05
B     2010-01-07   2011-01-07  2012-01-07  2013-01-06  2014-01-06
2015-01-06

dat <- data.frame(Class = c("AAA", "C", "CC", "BB", "B", "A"), Close_date =
c(ini, ini, ini, ini+109, ini+39, ini+24), stringsAsFactors = FALSE)
ind <- sapply(dat$Class, function(x) match(x, rownames(oc)))

for (i in length(ind))  {
    dat[["Flag"]] <- sapply(dat[["Close_date"]], function(x) ifelse((x >=
oc[ind[[i]], 1] & x < oc[ind[[i]], 2]) | (x >= oc[ind[[i]], 3] & x <
    oc[ind[[i]], 4]) | (x >= oc[ind[[i]], 5] & x < oc[ind[[i]], 6]),
"Valid", "Invalid"))
}
dat
     Class   Close_date    Flag
*1   AAA    2010-01-01   Invalid*
2     C      2010-01-01   Invalid
3    CC    2010-01-01    Invalid
4    BB    2010-04-20    Valid
5     B     2010-02-09    Valid
6     A     2010-01-25    Valid
The first record (highlighted in yellow) is flagged as "Invalid" where it
should really be "Valid".

Any suggestions on resolving this would be great.

Many thanks.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to