Marc - very nice. On Wed, Jun 19, 2019 at 6:30 PM Marc Schwartz <marc_schwa...@me.com> wrote:
> Hi All, > > Is there a reason that the following was not considered? > > > DF > Date O3_Conc > 1 2000-01-01 0.033 > 2 2000-01-02 0.023 > 3 2000-01-03 0.025 > 4 2000-01-04 0.041 > 5 2000-01-05 0.063 > 6 2000-01-06 0.028 > 7 2000-01-07 0.068 > 8 2000-01-08 0.048 > 9 2000-01-09 0.037 > 10 2000-01-10 0.042 > 11 2000-01-11 0.027 > 12 2000-01-12 0.035 > 13 2000-01-13 0.063 > 14 2000-01-14 0.035 > 15 2000-01-15 0.042 > 16 2000-01-16 0.028 > > > str(DF) > 'data.frame': 16 obs. of 2 variables: > $ Date : Date, format: "2000-01-01" ... > $ O3_Conc: num 0.033 0.023 0.025 0.041 0.063 0.028 0.068 0.048 0.037 > 0.042 ... > > > > aggregate(O3_Conc ~ cut(Date, breaks = "14 days"), data = DF, max) > cut(Date, breaks = "14 days") O3_Conc > 1 2000-01-01 0.068 > 2 2000-01-15 0.042 > > > See ?cut.Date > > Note that there is an error in Aisyah's original post, in that the value > for the max concentration in the second 14 day window is 0.042, not 0.061. > > Also, if one wanted to use actual calendar weeks, with a week starting on > Sunday, as opposed to 14 day windows starting from the minimum date, you > could use: > > > aggregate(O3_Conc ~ cut(Date, breaks = "2 weeks", start.on.monday = > FALSE), data = DF, max) > cut(Date, breaks = "2 weeks", start.on.monday = FALSE) O3_Conc > 1 1999-12-26 0.068 > 2 2000-01-09 0.063 > > > If desired, you could pre-create a column in the DF with the result of > cut(): > > DF$TimeStartDays <- cut(DF$Date, breaks = "14 days") > DF$TimeStartWks <- cut(DF$Date, breaks = "2 weeks", start.on.monday = > FALSE) > > > DF > Date O3_Conc TimeStartDays TimeStartWks > 1 2000-01-01 0.033 2000-01-01 1999-12-26 > 2 2000-01-02 0.023 2000-01-01 1999-12-26 > 3 2000-01-03 0.025 2000-01-01 1999-12-26 > 4 2000-01-04 0.041 2000-01-01 1999-12-26 > 5 2000-01-05 0.063 2000-01-01 1999-12-26 > 6 2000-01-06 0.028 2000-01-01 1999-12-26 > 7 2000-01-07 0.068 2000-01-01 1999-12-26 > 8 2000-01-08 0.048 2000-01-01 1999-12-26 > 9 2000-01-09 0.037 2000-01-01 2000-01-09 > 10 2000-01-10 0.042 2000-01-01 2000-01-09 > 11 2000-01-11 0.027 2000-01-01 2000-01-09 > 12 2000-01-12 0.035 2000-01-01 2000-01-09 > 13 2000-01-13 0.063 2000-01-01 2000-01-09 > 14 2000-01-14 0.035 2000-01-01 2000-01-09 > 15 2000-01-15 0.042 2000-01-15 2000-01-09 > 16 2000-01-16 0.028 2000-01-15 2000-01-09 > > and then use: > > > aggregate(O3_Conc ~ TimeStartDays, data = DF, max) > TimeStartDays O3_Conc > 1 2000-01-01 0.068 > 2 2000-01-15 0.042 > > > aggregate(O3_Conc ~ TimeStartWks, data = DF, max) > TimeStartWks O3_Conc > 1 1999-12-26 0.068 > 2 2000-01-09 0.063 > > > Regards, > > Marc Schwartz > > > > > On Jun 19, 2019, at 10:45 AM, Eric Berger <ericjber...@gmail.com> wrote: > > > > I just realized that my method as written only works if the entire date > > range is within a single calendar year. > > Here is a revised version that should handle the general case. > > > > In this version I assume that the conversion from the original numbers to > > dates has already been done > > e.g. it can be done using ymd() as I showed before > > > > library(dplyr) > > library(lubridate) > > > > ## create some example data for 36 months from 2010 > > d <- seq(from=as.Date("2010-01-01"),to=as.Date("2012-12-31"),by="1 day") > > conc <- runif(length(d)) # concentrations > > df <- data.frame( dt=d, conc=conc) > > df2 <- dplyr::mutate( df, wkpair=as.integer(floor((0:(nrow(df)-1))/14)) ) > > df3 <- dplyr::group_by(df2,wkpair) %>% > > dplyr::summarise(from=min(dt), to=max(dt), maxconc=max(conc)) %>% > > dplyr::select(from,to,maxconc) > > df3 > > > > # A tibble: 79 x 3 > > from to maxconc > > <date> <date> <dbl> > > 1 2010-01-01 2010-01-14 0.928 > > 2 2010-01-15 2010-01-28 0.834 > > 3 2010-01-29 2010-02-11 0.989 > > 4 2010-02-12 2010-02-25 0.983 > > 5 2010-02-26 2010-03-11 0.898 > > 6 2010-03-12 2010-03-25 0.864 > > 7 2010-03-26 2010-04-08 0.920 > > 8 2010-04-09 2010-04-22 0.925 > > 9 2010-04-23 2010-05-06 0.998 > > 10 2010-05-07 2010-05-20 0.761 > > # ... with 69 more rows > > > > On Wed, Jun 19, 2019 at 5:08 PM Eric Berger <ericjber...@gmail.com> > wrote: > > > >> Hi Bert, > >> I agree that our interpretation is slightly different, which is why I > >> wrote: > >> "If one can take the actual week of the year as an acceptable definition > >> of week, then here's my approach." > >> > >> Regards, > >> Eric > >> > >> On Wed, Jun 19, 2019 at 5:04 PM Bert Gunter <bgunter.4...@gmail.com> > >> wrote: > >> > >>> Eric: > >>> > >>> I believe you're doing something different than I did. I broke up each > >>> month into biweekly periods, 2+ per month. You seem to be grouping the > >>> overall entire period into biweekly intervals -- apologies if I'm > wrong, > >>> but if I understood correctly, that's not the same thing. I do not know > >>> which of us -- if either -- has interpreted her query correctly. > >>> > >>> Cheers, > >>> Bert > >>> > >>> On Wed, Jun 19, 2019 at 2:35 AM Eric Berger <ericjber...@gmail.com> > >>> wrote: > >>> > >>>> Hi Siti, > >>>> I didn't test Bert's code but I assume it's fine. :-) > >>>> I would take a different approach than Bert. I prefer to use a package > >>>> such as lubridate to handle the date wrangling, and a package such as > dplyr > >>>> to handle the grouping and max extraction. > >>>> It may be overkill for this problem, but these are great packages to > >>>> become familiar with. > >>>> If one can take the actual week of the year as an acceptable > definition > >>>> of week, then here's my approach. > >>>> > >>>> library(lubridate) > >>>> library(dplyr) > >>>> > >>>> # Step 1: start with Bert's code to create sample data > >>>> ## create some example data for 3 months in 2000 > >>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates > >>>> conc <- runif(length(d)) # concentrations > >>>> > >>>> # Step 2: collect the data into a data frame > >>>> df <- data.frame( dt=d, conc=conc) > >>>> > >>>> # Step 3: use lubridate's ymd() function to parse the dates, its > week() > >>>> function to identify the week of the year, and define the new column > >>>> 'wkpair' that groups the weeks 2-at-a-time > >>>> df2 <- dplyr::mutate( df, > >>>> wkpair=as.integer(floor(lubridate::week(lubridate::ymd(dt) )/2)) ) > >>>> > >>>> # Step 4: group by the wkpair and use dplyr's summarise to get the > info > >>>> you wanted > >>>> df3 <- dplyr::group_by(df2,wkpair) %>% > >>>> dplyr::summarise( from=min(dt), to=max(dt), > maxconc=max(conc)) > >>>> %>% > >>>> dplyr::select(from,to,maxconc) > >>>> > >>>> df3 > >>>> > >>>> # A tibble: 6 x 3 > >>>> from to maxconc > >>>> <dbl> <dbl> <dbl> > >>>> 1 20000113 20000121 0.963 > >>>> 2 20000122 20000204 0.988 > >>>> 3 20000205 20000218 0.939 > >>>> 4 20000219 20000303 0.883 > >>>> 5 20000304 20000317 0.863 > >>>> 6 20000318 20000330 0.765 > >>>> > >>>> HTH, > >>>> Eric > >>>> > >>>> > >>>> > >>>> On Tue, Jun 18, 2019 at 9:39 PM Bert Gunter <bgunter.4...@gmail.com> > >>>> wrote: > >>>> > >>>>> My apologies. I negected to cc r-help. -- Bert > >>>>> > >>>>> > >>>>> > >>>>> On Tue, Jun 18, 2019 at 11:21 AM Bert Gunter <bgunter.4...@gmail.com > > > >>>>> wrote: > >>>>> > >>>>>> > >>>>>> I assume that 20000215 means year 2000, month 2, day 15. > >>>>>> I also assume that you want maxes for the first 2 weeks of a month, > >>>>> the > >>>>>> second 2 weeks, and any remaining days. > >>>>>> I also assume that this might be desired for arbitrary numbers of > >>>>> years, > >>>>>> months, and days. > >>>>>> > >>>>>> The following is one way to do this. As it's a slight pain to cut > and > >>>>>> paste email data as text into R (use ?dput or R code to run to > provide > >>>>>> example data instead), I just made up my own. You'll have to do the > >>>>>> following within a data frame through extraction or by using with() > of > >>>>>> course. > >>>>>> > >>>>>> ## create some example data for 3 months in 2000 > >>>>>> d<- 2e7 +c(113:131,201:228, 301:330) ## dates > >>>>>> conc <- runif(length(d)) # concentrations > >>>>>> > >>>>>> ## convert the date to character to extract year, month, and day > >>>>>> cdate <- as.character(d) > >>>>>> ## use substr to to the extraction > >>>>>> year <- substr(cdate,1,4) > >>>>>> mon <- substr(cdate,5,6) > >>>>>> day <- substr(cdate, 7,8) > >>>>>> > >>>>>> ## convert day to numeric and use cut() to group into the biweekly > >>>>> periods. > >>>>>> d14 <- cut(as.numeric(day), c(0,14.5,28.5, 32)) > >>>>>> > >>>>>> ## Use tapply() to create your desired table of results. > >>>>>> tapply(conc, list(year, d14, mon), max, na.rm = TRUE) > >>>>>> > >>>>>> ## Results > >>>>>> > >>>>>> , , 01 > >>>>>> > >>>>>> (0,14.5] (14.5,28.5] (28.5,32] > >>>>>> 2000 0.7357389 0.9655391 0.7962965 > >>>>>> > >>>>>> , , 02 > >>>>>> > >>>>>> (0,14.5] (14.5,28.5] (28.5,32] > >>>>>> 2000 0.8193979 0.9487207 NA > >>>>>> > >>>>>> , , 03 > >>>>>> > >>>>>> (0,14.5] (14.5,28.5] (28.5,32] > >>>>>> 2000 0.9718919 0.9997093 0.168659 > >>>>>> > >>>>>> > >>>>>> Cheers, > >>>>>> Bert > >>>>>> > >>>>>> Bert Gunter > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Jun 18, 2019 at 8:53 AM SITI AISYAH BINTI ZAKARIA < > >>>>>> aisyahzaka...@unimap.edu.my> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I'm Aisyah..I have a problem to run my R coding. I want to select > >>>>> maximum > >>>>>>> value according to week. > >>>>>>> > >>>>>>> here is my data > >>>>>>> > >>>>>>> Date O3_Conc > >>>>>>> 20000101 0.033 > >>>>>>> 20000102 0.023 > >>>>>>> 20000103 0.025 > >>>>>>> 20000104 0.041 > >>>>>>> 20000105 0.063 > >>>>>>> 20000106 0.028 > >>>>>>> 20000107 0.068 > >>>>>>> 20000108 0.048 > >>>>>>> 20000109 0.037 > >>>>>>> 20000110 0.042 > >>>>>>> 20000111 0.027 > >>>>>>> 20000112 0.035 > >>>>>>> 20000113 0.063 > >>>>>>> 20000114 0.035 > >>>>>>> 20000115 0.042 > >>>>>>> 20000116 0.028 > >>>>>>> > >>>>>>> I want to find the max value from column O3_Conc for only 14 days > >>>>> that > >>>>>>> refer to biweekly in month. And the next 14 days for the max value. > >>>>>>> > >>>>>>> I hope that I can get the result like this: > >>>>>>> > >>>>>>> Date Max O3_Conc > >>>>>>> 20000101 - 20000114 0.068 > >>>>>>> 20000115 - 20000129 0.061 > >>>>>>> > >>>>>>> I try many coding but still unavailable. > >>>>>>> > >>>>>>> this example my coding > >>>>>>> > >>>>>>> library(plyr) > >>>>>>> data.frame(CA0003) > >>>>>>> > >>>>>>> # format weeks as per requirement (replace "00" with "52" and > >>>>>>> adjust corresponding year) > >>>>>>> tmp <- list() > >>>>>>> tmp$y <- format(df$Date, format="%Y") > >>>>>>> tmp$w <- format(df$Date, format="%U") > >>>>>>> tmp$y[tmp$w=="00"] <- > >>>>> as.character(as.numeric(tmp$y[tmp$w=="00"]) - > >>>>>>> 14) > >>>>>>> tmp$w[tmp$w=="00"] <- "884" > >>>>>>> df$week <- paste(tmp$y, tmp$w, sep = "-") > >>>>>>> > >>>>>>> # get summary > >>>>>>> df2 <- ddply(df, .(week),transform, O3_Conc=max(O3_Conc)) > >>>>>>> > >>>>>>> # include week ending date > >>>>>>> tmp$week.ending <- lapply(df2$week, function(x) rev(df[df$week > >>>>> ==x, > >>>>>>> "O3_Conc"])[[1]]) > >>>>>>> df2$week.ending <- sapply(tmp$week.ending, max(O3_Conc, TRUE) > >>>>>>> > >>>>>>> output > >>>>>>> Site_Id Site_Location > >>>>> Date > >>>>>>> Year O3_Conc Month Day week > >>>>>>> 1 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000101 > >>>>>>> 2000 0.033 1 1 NULL-NULL > >>>>>>> 2 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000102 > >>>>>>> 2000 0.023 1 2 NULL-NULL > >>>>>>> 3 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000103 > >>>>>>> 2000 0.025 1 3 NULL-NULL > >>>>>>> 4 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000104 > >>>>>>> 2000 0.041 1 4 NULL-NULL > >>>>>>> 5 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000105 > >>>>>>> 2000 0.063 1 5 NULL-NULL > >>>>>>> 6 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000106 > >>>>>>> 2000 0.028 1 6 NULL-NULL > >>>>>>> 7 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000107 > >>>>>>> 2000 0.068 1 7 NULL-NULL > >>>>>>> 8 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000108 > >>>>>>> 2000 0.048 1 8 NULL-NULL > >>>>>>> 9 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000109 > >>>>>>> 2000 0.037 1 9 NULL-NULL > >>>>>>> 10 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000110 > >>>>>>> 2000 0.042 1 10 NULL-NULL > >>>>>>> 11 CA0003 Sek. Keb. Cederawasih, Taman Inderawasih, Perai > >>>>> 20000111 > >>>>>>> 2000 0.027 1 11 NULL-NULL > >>>>>>> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.