replace `lst2` with:
#Subset of data

lst1Sub <- lapply(lst1Not1970,function(x) x[c(1:25, 18707:18708)])

lst2 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G.{3}).*","\\1",x); 
dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)), 
 Sims <- str_trim(gsub(".*G.{3}\\s?(.*)","\\1",x));Sims[grep("\\d+-",Sims)] <- 
gsub("(.*)([- ][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 
\\2", Sims[grep("\\d+-",Sims)]));Sims1 <- read.table(text=Sims,header=FALSE); 
names(Sims1) <- c("Precipitation", "Tmin", "Tmax");dat2 <- cbind(dat1,Sims1)})
[[1]] Year Month Day Site Precipitation   Tmin  Tmax
25 1971     1   1 GG25          0.36 -14.32  3.87
26 1971     6   5 G107        144.09  11.25 30.44
27 1971     6   5 G108          0.66   9.33 32.96 



On Monday, March 31, 2014 2:35 AM, Zilefac Elvis <zilefacel...@yahoo.com> wrote:

Hi AK,
I figured out that the error is from "Sim1971-2000_Daily_Sim001.dat".
The other files had no error when I ran this section of the code which detects 
an error:

               lst2 <- lapply(lst1Sub,
               function(x) {dateSite <- gsub("(.*G\\d+).*","\\1",x); 
                            dat1 <- 
                            Sims <- gsub(".*G\\d+\\s+(.*)","\\1",x); 
Sims[grep("\\d+-",Sims)] <- gsub("(.*)([- ][0-9]+\\.[0-9]+)","\\1 
\\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); 
                            Sims1 <- read.table(text=Sims,header=FALSE); 
                            names(Sims1) <- c("Precipitation", "Tmin", 
"Tmax");dat2 <- cbind(dat1,Sims1)})

After examining line 18707 of lst1Sub obtained by using only 
It reads as 1971 6 5G107144.09 11.25 30.44. When I replace 144.09 with 44.09, 
the code runs perfect. 144.09 is such a high value but that is what the 
simulation realised from the calibrated model. In most cases, Precip values are 
2 values before a decimal point. However, in some cases as above, it could be 3 
values before decimal point.

How can we avoid the error and read the data as is? Please try to include the 
bold text in the code below and see what happens:

lst2 <- lapply(lst1Sub,
               function(x) {dateSite <- gsub("(.*G\\d+).*","\\1",x); 
                            dat1 <- 
                            Sims <- gsub(".*G\\d+\\s+(.*)","\\1",x); 
Sims[grep("\\d+-",Sims)] <- gsub("(.*)([- ][0-9]+\\.[0-9]+)","\\1 
\\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); 
                            Sims1 <- read.table(text=Sims,header=FALSE); 
                            names(Sims1) <- c("Precipitation", "Tmin", 
"Tmax");dat2 <- cbind(dat1,Sims1)})

Thanks AK.
On Sunday, March 30, 2014 11:01 PM, Zilefac Elvis <zilefacel...@yahoo.com> 

Hi AK,
You did just what I wanted. I tried it using this subset:
#Using a small subset:
lst1Sub <- lapply(lst1Not1970,function(x) x[1:1000]) 

and it worked so well.

However, I would like to do it for all the data, so I changed x[1:1000]
to lst1Sub <- lapply(lst1Not1970,function(x) x) # did I make a mistake here?
and got an error:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 18707 did not have 3 elements

Where could the mistake be coming from? 
I tried lst1Sub <- lapply(lst1Not1970,function(x) x[1:18708])
but encountered same error.

lst1Sub <- lapply(lst1Not1970,function(x) x[1:18706]) works perfect.

I have opened all the three files and checked line 18707 but found nothing 
wrong with the values. Please help.

On Sunday, March 30, 2014 7:21 PM, arun <smartpink...@yahoo.com> wrote:

I did exactly the same as you mentioned, but on a smaller datatset as it takes 
time.  Also, your dataset is not very consistent in formatting especially in 
the Precipitation, Tmin, Tmax columns.  For e.g., some values are:
0.48-2.14 -1.48
1.48 -2.12-1.21

Check the space between the two options above.  Anyway, I did change those in 
the subset dataset.  I am not sure whether there is some other problems in your 
original dataset.  

On Sunday, March 30, 2014 7:09 PM, Zilefac Elvis <zilefacel...@yahoo.com> wrote:

Hi AK,
I will try the code you just sent when I reach home.
However, let me
use the example you just provided and be clearer on how
the output should look like.
#[1] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[3] "Sim1971-2000_Daily_Sim003.dat"

The above 3 files represent 3 simulations.
In the three folders, I will have 120 files per folder.
As you said:
1989 4 5GG38 0.48 -3.25 13.69 
year month day site Precipitation Tmax Tmin.
So, in the output, I wanted in the:
#Precipitation folder
1989 4 5GG38  0.48 
1989 4 5GG39  0.00 
1989 4 5GG40  0.00
1989 4 5GG41  0.00 

But the individual files should have as filenames:
filename40=GG40 and so on for 120 sites.
The contents of each filename should be year,Month,Day, sim001,sim002,sim003
So, take all site codes and use the to name the files in each folder. Within 
each site code, there are precipitation values from 1971-2005 and from 

In essence, I had 120 sites, each site had Precipitation,Tmin,Tmax. I did 3 
simulations and put it in 'sample' file. The simulation is from 1971-2005 
Now I want to take each site and for each variable (in 3 folders), create a 
dataframe where the 3 simulations are stored with colnames as: year,Month,Day, 
sim001,sim002,sim003. Do this for Precip,Tmin and Tmax separately.

#Tmin folder
1989 4 5GG38   -3.25 
1989 4 5GG39   -9.82 
1989 4 5GG40  -14.74 
1989 4 5GG41   -4.37  

.... do same as for precipitation

#Tmax folder
1989 4 5GG38   13.69
1989 4 5GG39   10.75
1989 4 5GG40   -1.13
1989 4 5GG41   8.06

Do same as for precipitation

Thanks very much,
On Sunday, March 30, 2014 2:11 PM, arun <smartpink...@yahoo.com> wrote:

I have one more doubt.  You mentioned 120 files in Precipitation folder, and 
similarly that
files in Tmin
and Tmax.  In the "sample" folder, you have simulation files:

#[1] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[3] "Sim1971-2000_Daily_Sim003.dat" 

As I understand the problem, you would have 120*3 ie. 360 files each for 
Precipitation, Tmin and Tmax from the Simulation datasets.  Please be clear 
about what you wanted.

On Sunday, March 30, 2014 1:34 PM, arun <smartpink...@yahoo.com> wrote:
Just to be clear:

1989 4 5GG38
0.48 -3.25 13.69 
year month day site Precipitation Tmax Tmin.
So, in the output, you wanted:
1989 4 5GG38 0.48
1989 4 5GG38 -3.25
1989 4 5GG38 13.69

Instead of long description, if you have just what you wanted just like above, 
we wouldn't have to do this back and forth emails.

Also, you mentioned a lot of simulations (sim1 to sim10).  According to your 
" For example, I will take precip from site GGG1 and have a data frame with 
colnames such as Year,Month,Day, sim1,sim2,...,sim100. Repeat this for all 120 
sites. So that for Precip, you will have 120 files corresponding to the site
codes. Each
has nrows
with Year,Month,Day, sim1...sim100 columns."

What I understand is that in the Preciptation folder, there are 120 files:
For example (using the same data):
1989 4 5GGG1 0.48

1989 4 6GGG1 0.25

#2nd precipitation file:

1989 4 5GGG2 0.74
1989 4 6GGG2 0.84


Now, back to the Sim1 ....sim100.  

indxdat <- 

 [,1]   [,2]          
[1,] "Sim1"
[2,] "Sim2" "Tmax"        
[3,] "Sim3" "Tmin"        
[4,] "Sim4" "Precipitation"
[5,] "Sim5" "Tmax" 

In your original file, if this is how the values are repeated, then:
In your result dataset:
Precipitation folders contain:
year month day Site Sim1 Sim4 Sim7 ....Sim100

Tmax folder:
year month day Site Sim2 Sim5 ....

Let me know if this is what you wanted the output.

Also, if you respect the positions I gave you, the variables will be perfectly 
split. Precipitation has no -(minus). The positions ensure that values do not 
cross from one variable to another
when the splitting is done.

Thanks, Atem.

------ Original Message ------

>From : arun
>To : Zilefac Elvis;
>Sent : 30-03-2014 02:14
>Subject : Re: Re: Please help
>HI Atem, It is still not clear.  You mentioned Precipitation occupies 13-18. 
>But, in the file, after the site, it is "Sim1", "Sim2", Sim3. etc.  So, I am 
>not sure what you are referring to Precipitation.
Tell me, in this data: 1989 4 5GG38  0.48 -3.25 13.69
1989 4 5GG39  0.00 -9.82 10.75
1989 4 5GG40  0.00-14.74 -1.13
1989 4 5GG41  0.00 -4.37  8.06 Which one is precipation, Tmax, and
Arun On Sunday, March 30, 2014 1:38 AM, Zilefac Elvis  wrote: Hi AK,
I was able to download the files.  In those files, the formatting is not 
consistent. I am glad you finally downloaded the files.As I indicated in the 
email description, the analysis starts from 1971 to 2005. Any values before 
1971 are meaningless. I used 1970 to initialize my simulation. So, let's start 
in 1971 onwards. Also, I didn't quite understand about splitting by 
Precipitation, Tmin, Tmax If look at this output: 1989 4 5GG38  0.48 -3.25 13.69
1989 4 5GG39  0.00 -9.82 10.75
1989 4 5GG40  0.00-14.74 -1.13
1989 4 5GG41  0.00 -4.37  8.06 The first 4 values represent the Year, next two 
values is the Month. For example, April is coded as 04 but the zero is just 
'space', and December is coded as 12.After the month, the next two values 
represent Day (1 to 31/30/28)
on the month of the year. The GGGs represent site code. Fpr example, site 1 = 
GGG1 and site 120 is G120. Now, if you open one of the Sim1971-2000_Daily_ 
files in an editor, it is Fortran-style read. For example, in all the files 
(see code below), "Year" occupies position 1-4, "Month" occupies position 5-6, 
"Day" occupies position 7-8, "Site" occupies position 9-12, Precipitation 
occupies position 13-18, Tmin occupies position 14-24, and Tmax occupies 
position 25-30. In anther project, I read such files unto R workspace using 
this code: rain.data <- scan("gaugvals.all",what=character(),sep="\n")# change 
'gaugvals.all' to file names in your directory
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),                 
Tmax=as.numeric(substr(rain.data,25,30))) # please check that brackets are 
Now you
should begin to get a feel of the data coding and how to split precipitation, 
Tmin and Tmax.   You mentioned that the columns are Year month date Sim1, Sim2, 
Sim3.  So, where is the info to split to the three folders? The original data 
file before I did the simulation was a dataframe which you helped me to 
re-arrange following
the instructions I gave you. You used this code to reshape the data into the 
format which now appears in the Sim1971-2000_Daily_  files. dat1 <- 
read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t") # 
Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat2M <- melt(dat1,id.var=c("year","month","day"))
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
#[1] 1972320       5
<- 1:nrow(dat2M1)
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax)  So you can see that here we 
reshaped the original data to [year,month,day,site,variable]. I did this for 
Precip,Tmin and Tmax separately and then combined them using 
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax). This is just how the sim files 
are structured. Our task now is to do the opposite of the above code and undo 
cbind(precipitation,Tmin,Tmax) so that precip,tmin and tmax will have separate 
folders. In each of 3 folders, there will be 120 files named by site codes. 
Each final file has nrows with Year,Month,Day, sim1...sim100 columns. But for 
the sample data I sent you, I think there are only 3 simulations, so we will 
have as final output Year,Month,Day, sim1, sim2, sim3 columns Let me know if 
you get a feel of what I am trying to achieve. Thanks very much AK.
Atem. On Saturday, March
29, 2014 9:30 PM, arun  wrote: HI Atem,
I was able to download the files.  In those files, the formatting is not 
consistent. 1989 4 5GG38  0.48 -3.25 13.69
1989 4 5GG39  0.00 -9.82 10.75
1989 4 5GG40  0.00-14.74 -1.13
1989 4 5GG41  0.00 -4.37  8.06 Compared to: 19701228GGG1  3.89 -3.94  7.90
19701228GGG2  3.89 -3.94  7.90
19701228GGG3  3.89 -3.94  7.90
19701228GGG4  3.89 -3.94  7.90 Also, I didn't quite understand about splitting 
by Precipitation, Tmin, Tmax.  You mentioned that the columns are Year month 
date Sim1, Sim2, Sim3.  So, where is the info to split to the three folders? 
Arun On Friday, March 28, 2014 1:56 AM, Zilefac Elvis  wrote: Hi AK,
Attached is a sample from the large file. The expected output is explained at 
the end of
this message (bold).
It is a little
lengthy but is worth it given that the number of sites is plentiful. I have 
attached three simulations, so your will have sim1,sim2,sim3 instead of sim1 to 
sim100 as in the previous message.
I have done some simulations in R and would like to order my data to usable 
The data is to large so I have attached via Dropbox.
When you load Calibration.RData to the
workspace, you will find the site codes (column 1) in "Prairies.Sites".
My initial dataset was in the form of a dataframe with with columns denoting 
stations. So I had three dataframes each for precipitation, Tmin, and Tmax. 
Individually, you reshaped the dataframes to three column vectors (see file 
called PrecipTminTmax) using this code:
dat1 <-
read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t") # 
Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat2M <- melt(dat1,id.var=c("year","month","day"))
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
#[1] 1972320       5
row.names(dat2M1) <- 1:nrow(dat2M1)
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax) The problem to be
solved Attached is a large file (SimCalibration.zip) containing my simulations 
(001 to 100). Please import files starting with "Sim1971-2000_Daily_" only. The 
rest is not important. My analysis is for the period 1971-2000. Any data before 
or after this period should be ignored.
My simulation was done in R using
Fortran encoding to read data values. All files are ".dat".
In each file, the columns are as follows :
Year, Month, Day, Site, Precip, Tmin, Tmax. In another project involving 
rainfall only, I read such files into R using this code:
rain.data <- scan("gaugvals.all",what=character(),sep="\n",n=257212)
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),                 
Rain=as.numeric(substr(rain.data,13,18)))  Q1) So, I would like to read all 
files beginning with "Sim1971-2000_Daily_".
2) Split each file by variable name (Precip, Tmin, Tmax) and then arrange each 
variable in the form of a dataframe. For example, I will take precip from site 
GGG1 and have a data frame with colnames
such as Year,Month,Day, sim1,sim2,...,sim100. Repeat this for all 120 sites. So 
that for Precip, you will have 120 files corresponding to the site codes. Each 
file has nrows with Year,Month,Day, sim1...sim100 columns. 3) Please repeat the 
above for Tmin and Tmax so that in the end I will have three folders (Precip, 
Tmin and
Tmax). Each
folder has 120 files with each file being a dataframe containing date and 100 
columns).  When you successfullly go through this "difficult" section,I will 
access each folder, read each file and apply a function to it one at a time. 
Thanks AK, this is part of my Msc thesis project. Your help would be fully 
acknowledged. You have helped me a lot towards the success of this project. 
Atem. On Thursday, March 27, 2014 9:09 PM, arun  wrote: HI Atem, I tried to 
download the first file. 
It is taking me forever.  With the speed I have, I doubt it would be 
successful.  Can you just provide some small reproducible example data and what 
your expected output would be?
Arun On Thursday, March 27,
2014 9:50 AM, "zilefacel...@yahoo.com"  wrote: Oh! Hope you had a safe trip.  
No problem AK. Please try and see what you can do. I will be waiting.  Have a 
great time in a beautiful country. Atem. ------ Original Message ------ From : 
>To : Zilefac Elvis;
>Sent : 27-03-2014 04:07
>Subject : Re: Please help
>HI Zilefac, I was on flight.  Right now, I am in India.  At my place, the 
>speed is not so great to download large files.  I will try later if it works.
Arun On Thursday, March 27, 2014 12:54
AM, Zilefac Elvis  wrote: Hi AK,
Please I need your help again.
I have done some
simulations in R and would like to order my data to usable format.
The data is to large so I have attached via Dropbox.
When you load Calibration.RData to the workspace, you will find the site codes
(column 1) in "Prairies.Sites".
My initial dataset was in the form of a dataframe with with columns denoting 
stations. So I had three dataframes each for precipitation, Tmin, and Tmax. 
Individually, you reshaped the dataframes to three column vectors (see file 
called PrecipTminTmax) using this code: library(reshape2)
dat1 <- read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t") 
# Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat2M <-
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
#[1] 1972320       5
row.names(dat2M1) <-
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax) The problem to be solved
Attached is a large file
(SimCalibration.zip) containing my simulations (001 to 100). Please import 
files starting with "Sim1971-2000_Daily_" only. The rest is not important. My 
analysis is for the period 1971-2000. Any data before or after this period 
should be ignored.
My simulation was done in R using Fortran encoding to read data values. All 
files are ".dat". In each file, the columns are as follows :
Year, Month, Day, Site, Precip, Tmin, Tmax In another project involving 
rainfall only, I read such files into R using this code:
rain.data <- scan("gaugvals.all",what=character(),sep="\n",n=257212)
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),                 
Rain=as.numeric(substr(rain.data,13,18))) Q1) So, I would like to read all 
files beginning with
2) Split each file by variable name (Precip, Tmin, Tmax) and then arrange each 
variable in the form of a dataframe. For example, I will take precip from site 
GGG1 and have a data frame with colnames such as Year,Month,Day, 
sim1,sim2,...,sim100. Repeat this for all 120 sites. So that for Precip, you 
will have 120 files
corresponding to
the site
codes. Each file
has nrows with Year,Month,Day, sim1...sim100 columns. 3) Please repeat the 
above for Tmin and Tmax so that in the end I will have three folders (Precip, 
Tmin and Tmax). Each folder has 120 files with each file being a dataframe 
containing date and 100 columns).  When you successfullly go through this 
"difficult" section,I will access each folder, read each file and apply a 
function to it one at a time. Thanks AK, this is part of my Msc thesis project. 
Your help would be fully acknowledged. You have helped me a lot
towards the success of this project. Atem.

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to