Here are some sample data sets. I also tried making a combined field in each set such as adq=paste(as.character(arr$Date), as.character(arr$quarter)) and similarly for the weather set, so I have unique single things to compare, but that did not seem to help much.
Thanks, Jim On 1/17/10 5:50 PM, David Winsemius wrote: > My guess (since we still have no data on which to test these ideas) > is that you need either to merge() or to use a matrix created from the > dates and qtr-hours entries in "gw", since matching on dates and hours > separately will not uniquely classify the good qtr-hours within their > proper corresponding dates. You want a structure (or a matching > process) that takes: > hqhr1 qhr2 qhr3 qhr4 ....... > date1 good bad good bad > date2 bad good good good > date3 bad bad bad good > . > . > . > and lets you use the values in "arr" to get values in "gw". Notice > that the notion of arr$Date %in% gw$date & arr$qtrhr %in% gw$qtrhr > simply will not accomplish anything correct/ > > Merging by multiple criteria (with the merge function) would do that > or you could construct a matrix whose entries were the categories good > /bad. The table function could create the matrix for the purpose of > using an indexed solution if you are dead-set against the merge concept. > > > > > On Jan 17, 2010, at 4:47 PM, James Rome wrote: > >> Thank you Dennis. >> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in% >> weather$quarter) >> seems to be what I want to do, but in fact, with the full data set, it >> misidentifies the rows, so I think the error message must mean >> something. >> >>> arrr$Date <- as.Date(as.character(ewr$Date),format="%m/%d/%y") >>> weather$Date <- as.Date(as.character(weather$Date),format="%m/%d/%y") >>> gw = c(length(arrr)) >>> gw[1:length(arrr[,1])]=FALSE >>> gw[arrr$Date==weather$Date & weather$quarter %in% arr$quarter] >> Warning in `==.default`(arr$Date, weather$Date) : >> longer object length is not a multiple of shorter object length >> Warning in arr$Date == weather$Date & weather$quarter %in% arr$quarter : >> longer object length is not a multiple of shorter object length >> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 >> [260] 0 0 0 0 0 0 0 0 >> >> There are many many more matches in the 99k line arrival data set. >> >> Thanks a bunch, >> Jim >> >> >> On 1/17/10 3:21 PM, Dennis Murphy wrote: >>> Hi: >>> >>> To read a data set from a R-help message into R, one uses >>> read.table(textConnection("<verbatim text>"), ...) >>> >>> Your weather data set had >>> (a) a variable name with a space in it, that R misread and had to be >>> altered manually; >>> (b) a missing value with no NA that R interpreted as an incomplete >>> line; again, it had >>> to be altered manually. >>> >>> This is why David suggested the use of dput(), so that these vagaries >>> don't have to be >>> dealt with by those who are trying to help. >>> >>> That being said, for the example that you gave and the desired value >>> that you wanted, try >>> >>> arr$gw <- as.numeric(weather$Date == arr$Date & arr$quarter %in% >>> weather$quarter) >>> >>> (I changed DateTime to Date in the arr data frame...) >>> >>> You'll get warnings like >>> >>> Warning messages: >>> 1: In is.na <http://is.na>(e1) | is.na <http://is.na>(e2) : >>> longer object length is not a multiple of shorter object length >>> >>> but it seems to do the right thing. The first equality is there to >>> constrain matches for >>> quarter to be within the same day. >>> >>> For future reference, >>> >>>> dput(weather) >>> structure(list(Date = structure(c(1L, 1L, 1L, 1L), .Label = "1/1/09", >>> class = "factor"), >>> minute = c(5L, 15L, 30L, 45L), hour = c(15L, 15L, 15L, 15L >>> ), quarter = 60:63, efficiency = c(NA, 72, 63.3, 85.4)), .Names = >>> c("Date", >>> "minute", "hour", "quarter", "efficiency"), class = "data.frame", >>> row.names = c(NA, >>> -4L)) >>>> dput(arr) >>> structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, >>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "1/1/09", >>> class = "factor"), >>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, >>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month = c(1L, 1L, 1L, 1L, 1L, >>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), >>> quarter = c(59L, 59L, 60L, 60L, 60L, 60L, 60L, 60L, 60L, >>> 60L, 60L, 60L, 60L, 61L, 61L, 61L, 61L, 66L, 67L), ICAO = >>> structure(c(6L, >>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L, 3L, 5L, 3L, 3L, 6L, 6L, >>> 2L, 4L, 3L), .Label = c("AAL", "AWE", "BTA", "CHQ", "CJC", >>> "COA", "JBU", "NWA"), class = "factor"), Flight = structure(c(15L, >>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L, 1L, 3L, 13L, 9L, 10L, >>> 14L, 16L, 2L, 11L, 7L), .Label = c("AAL842", "AWE307", "BTA1234", >>> "BTA2064", "BTA2085", "BTA2347", "BTA2405", "BTA2916", "BTA3072", >>> "BTA3086", "CHQ5312", "CJC3225", "CJC3359", "COA1166", "COA349", >>> "COA855", "COA886", "JBU554", "NWA9934"), class = "factor"), >>> gw = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, >>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, >>> FALSE)), .Names = c("Date", "weekday", "month", "quarter", >>> "ICAO", "Flight", "gw"), row.names = c(NA, -19L), class = "data.frame") >>> >>> These can be copied and pasted directly into an R session without >>> modification. >>> >>> HTH, >>> Dennis >>> >>> On Sun, Jan 17, 2010 at 10:51 AM, James Rome <jamesr...@gmail.com >>> <mailto:jamesr...@gmail.com>> wrote: >>> >>> >>> >>> >>> On 1/17/10 1:06 PM, David Winsemius wrote: >>>> >>>> On Jan 17, 2010, at 12:37 PM, James Rome wrote: >>>> >>>>> I don't think it is that simple because it is not a one-to-one >>> match. In >>>>> the arr data frame, there are many arrivals in a quarter hour >>> with good >>>>> weather on a given day. So I need to match the date and the quarter >>>>> hour. >>>>> >>>>> And all of the rows in the weather data frame are times with good >>>>> weather--unique date + quarter hour. That is why I needed the >>> loop. For >>>>> each date and quarter hour in weather, I want to mark all the >>> entries >>>>> with the corresponding date and weather as TRUE in the arr$gw >>> column. >>>>> >>>>> I did convert the dates to POSIXlt dates and rewrote my function as >>>>> gooddates = function(all, good) { >>>>> la = length(all) # All the arrivals >>>>> lw = length(good) # The good 15-minute periods >>>>> for(j in 1:lw) { >>>>> d=good$Date[j] >>>>> q=good$quarter[j] >>>>> all$gw[all$Date==d && all$quarter==q]=TRUE >>>> >>>> >>>> You are attempting a vectorized test and assignment with "&&" which >>>> seems unlikely to succeed, but even then I am not sure your problems >>>> would be over. (I'm also guessing that you might not have reported a >>>> warning.) >>> >>> Why shouldn't the && succeed? You are correct there, because I do >>> get >>> items if I use either part of this and test, when I insert the &&, >>> I get >>> no hits. And I got no warnings. >>>> >>>> Why not merge arr to gw by date and quarter? >>> The sets contain different data, and the only thing I want from the >>> weather set is the fact that it has an entry for a given date and >>> time >>>> >>>> Answering these questions would be greatly speeded up with a small >>>> sample dataset. Are you aware of the virtues of the dput function? >>>> >>> >>> What I want is for a 1 to be in the gw column in the quarter >>> 60,61,62,63,... >>> >>> For example, here is some data from the good weather set: >>> Date minute hour quarter Efficiency Val >>> 1/1/09 5 15 60 >>> 1/1/09 15 15 61 72 >>> 1/1/09 30 15 62 63.3 >>> 1/1/09 45 15 63 85.4 >>> >>> >>> >>> And this is from the arrivals set: >>> DateTime weekday month quarter ICAO >>> Flight gw >>> >>> 1/1/09 5 1 59 COA COA349 0 >>> 1/1/09 5 1 59 NWA NWA9934 0 >>> 1/1/09 5 1 60 JBU JBU554 0 >>> 1/1/09 5 1 60 BTA BTA2347 0 >>> 1/1/09 5 1 60 COA COA886 0 >>> 1/1/09 5 1 60 BTA BTA2916 0 >>> 1/1/09 5 1 60 CJC CJC3225 0 >>> 1/1/09 5 1 60 BTA BTA2085 0 >>> 1/1/09 5 1 60 BTA BTA2064 0 >>> 1/1/09 5 1 60 AAL AAL842 0 >>> 1/1/09 5 1 60 BTA BTA1234 0 >>> 1/1/09 5 1 60 CJC CJC3359 0 >>> 1/1/09 5 1 60 BTA BTA3072 0 >>> 1/1/09 5 1 61 BTA BTA3086 0 >>> 1/1/09 5 1 61 COA COA1166 0 >>> 1/1/09 5 1 61 COA COA855 0 >>> 1/1/09 5 1 61 AWE AWE307 0 >>> 1/1/09 5 1 66 CHQ CHQ5312 0 >>> 1/1/09 5 1 67 BTA BTA2405 0 >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org <mailto:R-help@r-project.org> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.