Sorry, forgot to include the list. On Sat, Sep 14, 2019 at 10:27 AM Jim Lemon <drjimle...@gmail.com> wrote: > > See inline > > On Fri, Sep 13, 2019 at 11:20 PM Subhamitra Patra > <subhamitra.pa...@gmail.com> wrote: >> >> Dear Sir, >> >> Yes, I understood the logic. But, still, I have a few queries that I >> mentioned below your answers. >> >>> "# if you only have to get the monthly averages, it can be done this way >>> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2) >>> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3)" >>> >>> B. Here, I need to define the no. of months, and years separately, right? >>> or else what 2, and 3 (in bold) indicates? >> >> >> To get the grouping variable of sequential months that you want, you only >> need the month and year values of the dates in the first column. First I >> used the "strsplit" function to split the date field at the hyphens, then >> used "sapply" to extract ("[") the second (month) and third (year) parts as >> two new columns. Because you have more than one year of data, you need the >> year values or you will group all Januarys, all Februarys and so on. Notice >> how I pass both of the new columns as a list (a data frame is a type of >> list) in the call to get the mean of each month. >> >> 1. Here, as per my understanding, the "3" indicates the 3rd year, right? >> But, you showed an average for 2 months of the same year. Then, what "3" in >> the spdat$year object indicate? > > > No, as I explained in the initial email and below, the "strsplit" function > takes one or more strings (your dates) and breaks them at the specified > character ("-"), So > > strsplit("1-1-1994","-") > [[1]] > [1] "1" "1" "1994" > > That is passed to the "sapply" function that applies the extraction ("[") > operator to the result of "strsplit". The "3" indicates that you want to > extract the third element, in this case, the year. > > > sapply(strsplit("1-1-1994","-"),"[",3) > [1] "1994" > > So by splitting the dates and extracting the second (month) and third (year) > element from each date, we have all the information needed to create a > grouping variable for monthly averages. > >> >> >>> C. From this part, I got the exact average values of both January and >>> February of 1994 for country A, and B. But, in code, I have a query that I >>> need to define spdat$returnA, and spdat$returnB separately before writing >>> this code, right? Like this, I need to define for each 84 countries >>> separately with their respective number of months, and years before writing >>> this code, right? >> >> >> I don't think so. Because I don't know what your data looks like, I am >> guessing that for each row, it has columns for each of the 84 countries. I >> don't know what these columns are named, either. Maybe: >> >> date Australia Belarus ... Zambia >> 01/01/1994 20 21 22 >> ... >> >> Here, due to my misunderstanding about the code, I was wrong. But, what data >> structure you guessed, it is absolutely right that for each row, I have >> columns for each of the 84 countries. So, I think, I need to define the date >> column with no. of months, and years once for all the countries. Therefore, >> I got my answer to the first and third question in the previous email (what >> you suggested) that I no need to define the column of each country, as the >> date, and no. of observations are same for all countries. But, the no. of >> days are different for each month, and similarly, for each year. So, I think >> I need to define date for each year separately. Hence, I have given an >> example of 12 months, for 2 years (i.e. 1994, and 1995), and have written >> the following code. Please correct me in case I am wrong. >> >> spdat<-data.frame( >> >> dates=paste(c(1:21,1:20,1:23,1:21,1:22,1:22,1:21,1:23,1:22,1:21,1:22,1:22),c(rep(1,21),rep(2,20),rep(3,23), >> rep(4,21), >> rep(5,22),rep(6,22),rep(7,21),rep(8,23),rep(9,22),rep(10,21),rep(11,22),rep(12,22)),rep(1994,260) >> >> dates1=paste(c(1:22,1:20,1:23,1:20,1:23,1:22,1:21,1:23,1:21,1:22,1:22,1:21),c(rep(1,22),rep(2,20),rep(3,23), >> rep(4,20), >> rep(5,23),rep(6,22),rep(7,21),rep(8,23),rep(9,21),rep(10,21),rep(11,22),rep(12,21)),rep(1995,259) >> ,sep="-") >> > First, you don't have to recreate the data that you already have. I did > because I don't have it and have to guess what it looks like. Remember > neither I nor any of the others who have offered help have your data or even > a representative sample. If you tried the code above, you surely must know > that it doesn't work. I could create code that would produce the dates from > 1-1-1994 to 31/12/1995 or any other stretch you would like, but it would only > confuse you more. _You already have the dates in your data file._ What I > have shown you is how to use those dates to create the grouping variable that > you want. > >> Concerning the exporting of structure of the dataset to excel, I will have >> 12*84 matrix. But, please suggest me the way to proceed for the large >> sample. I have mentioned below what I understood from your code. Please >> correct me if I am wrong. >> 1. I need to define the date for each year as the no. of days in each month >> are different for each year (as mentioned in my above code). For instance, >> in my data file, Jan 1994 has 21 days while Jan 1995 has 22 days. >> 2. Need to define the date column as character. >> 3. Need to define the monthly average for each month, and year. So, now code >> will be as follows. >> spdat$month<-sapply(strsplit(spdat$dates,"-"),"[",2,3,4,5,6,7,8,9,10,11,12) >> %%%%As I need all months average sequentially. >> spdat$year<-sapply(strsplit(spdat$dates,"-"),"[",3) >> >> Here, this meaning of "3", I am really unable to get > > > You have missed the point here, as above. I didn't mean to suggest that you > had to recreate the dates that you already have. What I did was to show how > you could use the dates that you already have to create a grouping variable > for your calculation. >> >> . >> >> 4. Need to define each country with each month and year as mentioned in the >> last part of your code. >> > What I did was to add the month and year of each row as two separate columns > of data. You should be able to see that by looking at spdat after the > "strsplit/sapply" operation. Then you have in each row the returns from your > 84 countries _and_ the month/year for that row. When I used the "by" function > to get the monthly means from spdat, I showed you how to use the same code on > your data frame, after creating the month and year columns, to get the > monthly average return for the 84 countries for as many years as you have. As > this should return a matrix of successive months as rows and countries as > columns, you should easily be able to import this into Excel. > > The reason I did it this way was to illustrate how to define the grouping > variable from the existing information and perform an easily understood > calculation. I thought that using methods that automatically perform the > operations I used might allow you to get the result without understanding how > you had gotten it. Like the data I didn't have, I took a guess at how much my > example would help you understand what was happening and give you the skills > to do it yourself. I do hope that you get to the point where you are able to > think of my example as unsophisticated, for I could have done it with > dplyr/ts/reshape and the rest. > > Jim
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.