Re: [R] Query about calculating the monthly average of daily data columns

Jim Lemon Fri, 13 Sep 2019 17:36:10 -0700

Sorry, forgot to include the list.

On Sat, Sep 14, 2019 at 10:27 AM Jim Lemon <drjimle...@gmail.com> wrote:
>
> See inline
>
> On Fri, Sep 13, 2019 at 11:20 PM Subhamitra Patra 
> <subhamitra.pa...@gmail.com> wrote:
>>
>> Dear Sir,
>>
>> Yes, I understood the logic. But, still, I have a few queries that I 
>> mentioned below your answers.
>>
>>> "# if you only have to get the monthly averages, it can be done this way
>>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2)
>>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)"
>>>
>>> B. Here, I need to define the no. of months, and years separately, right? 
>>> or else what 2, and 3 (in bold) indicates?
>>
>>
>> To get the grouping variable of sequential months that you want, you only 
>> need the month and year values of the dates in the first column. First I 
>> used the "strsplit" function to split the date field at the hyphens, then 
>> used "sapply" to extract ("[") the second (month) and third (year) parts as 
>> two new columns. Because you have more than one year of data, you need the 
>> year values or you will group all Januarys, all Februarys and so on. Notice 
>> how I pass both of the new columns as a list (a data frame is a type of 
>> list) in the call to get the mean of each month.
>>
>> 1. Here, as per my understanding, the "3" indicates the 3rd year, right? 
>> But, you showed an average for 2 months of the same year. Then, what "3" in 
>> the  spdat$year object indicate?
>
>
> No, as I explained in the initial email and below, the "strsplit" function 
> takes one or more strings (your dates) and breaks them at the specified 
> character ("-"), So
>
> strsplit("1-1-1994","-")
> [[1]]
> [1] "1"    "1"    "1994"
>
> That is passed to the "sapply" function that applies the extraction ("[") 
> operator to the result of "strsplit". The "3" indicates that you want to 
> extract the third element, in this case, the year.
>
> > sapply(strsplit("1-1-1994","-"),"[",3)
> [1] "1994"
>
> So by splitting the dates and extracting the second (month) and third (year) 
> element from each date, we have all the information needed to create a 
> grouping variable for monthly averages.
>
>>
>>
>>> C. From this part, I got the exact average values of both January and 
>>> February of 1994 for country A, and B. But, in code, I have a query that I 
>>> need to define  spdat$returnA, and  spdat$returnB separately before writing 
>>> this code, right? Like this, I need to define for each 84 countries 
>>> separately with their respective number of months, and years before writing 
>>> this code, right?
>>
>>
>> I don't think so. Because I don't know what your data looks like, I am 
>> guessing that for each row, it has columns for each of the 84 countries. I 
>> don't know what these columns are named, either. Maybe:
>>
>> date             Australia   Belarus   ...    Zambia
>> 01/01/1994   20             21                 22
>> ...
>>
>> Here, due to my misunderstanding about the code, I was wrong. But, what data 
>> structure you guessed, it is absolutely right that for each row, I have 
>> columns for each of the 84 countries. So, I think, I need to define the date 
>> column with no. of months, and years once for all the countries. Therefore, 
>> I got my answer to the first and third question in the previous email (what 
>> you suggested) that I no need to define the column of each country, as the 
>> date, and no. of observations are same for all countries. But, the no. of 
>> days are different for each month, and similarly, for each year. So, I think 
>> I need to define date for each year separately.  Hence, I have given an 
>> example of 12 months, for 2 years (i.e. 1994, and 1995), and have written 
>> the following code. Please correct me in case I am wrong.
>>
>>  spdat<-data.frame(
>>   
>> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),rep(3,23),
>>  rep(4,21), 
>> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)),rep(1994,260)
>>  
>> dates1=paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),rep(3,23),
>>  rep(4,20), 
>> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)),rep(1995,259)
>>  ,sep="-")
>>
> First, you don't have to recreate the data that you already have. I did 
> because I don't have it and have to guess what it looks like. Remember 
> neither I nor any of the others who have offered help have your data or even 
> a representative sample. If you tried the code above, you surely must know 
> that it doesn't work. I could create code that would produce the dates from 
> 1-1-1994 to 31/12/1995 or any other stretch you would like, but it would only 
> confuse you more.  _You already have the dates in your data file._ What I 
> have shown you is how to use those dates to create the grouping variable that 
> you want.
>
>> Concerning the exporting of structure of the dataset to excel, I will have 
>> 12*84 matrix. But, please suggest me the way to proceed for the large 
>> sample. I have mentioned below what I understood from your code. Please 
>> correct me if I am wrong.
>> 1. I need to define the date for each year as the no. of days in each month 
>> are different for each year (as mentioned in my above code). For instance, 
>> in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days.
>> 2. Need to define the date column as character.
>> 3. Need to define the monthly average for each month, and year. So, now code 
>> will be as follows.
>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12)  
>>   %%%%As I need all months average sequentially.
>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)
>>
>> Here, this meaning of "3", I am really unable to get
>
>
> You have missed the point here, as above. I didn't mean to suggest that you 
> had to recreate the dates that you already have. What I did was to show how 
> you could use the dates that you already have to create a grouping variable 
> for your calculation.
>>
>> .
>>
>> 4. Need to define each country with each month and year as mentioned in the 
>> last part of your code.
>>
> What I did was to add the month and year of each row as two separate columns 
> of data. You should be able to see that by looking at spdat after the 
> "strsplit/sapply" operation. Then you have in each row the returns from your 
> 84 countries _and_ the month/year for that row. When I used the "by" function 
> to get the monthly means from spdat, I showed you how to use the same code on 
> your data frame, after creating the month and year columns, to get the 
> monthly average return for the 84 countries for as many years as you have. As 
> this should return a matrix of successive months as rows and countries as 
> columns, you should easily be able to import this into Excel.
>
> The reason I did it this way was to illustrate how to define the grouping 
> variable from the existing information and perform an easily understood 
> calculation. I thought that using methods that automatically perform the 
> operations I used might allow you to get the result without understanding how 
> you had gotten it. Like the data I didn't have, I took a guess at how much my 
> example would help you understand what was happening and give you the skills 
> to do it yourself. I do hope that you get to the point where you are able to 
> think of my example as unsophisticated, for I could have done it with 
> dplyr/ts/reshape and the rest.
>
> Jim


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query about calculating the monthly average of daily data columns

Reply via email to