Hi Eliza

To me it seems like that you're not thinking before you messing about with the 
data before an analysis.

The years with data for 366 days is leap years. It happens every fourth year 
and the extra day falls on the 29th of februar. I guess it is the results from 
the dcast function that screws things up to make you believe that it's day 
number 366.

The best thing to do is to do your analysis on the complete data with some 
missing values for February 29th between leap years.

Or you can discard the leap year day for leap years and do the analysis for all 
years of 365 days.

What is the rationale by imputing missing data using the approx function? I 
mean the no leap year has only 365 days.

If you for some unknown reasons you want to fill in value for the NAs one 
"natural" way is to substitute the NAs on February 29th by the mean of the 
values on February 28th and Marts 1st. I think there is a na.approx function 
for that in some package (perhaps zoo). Other metods are available in R: google 
for R + impute.

Best Regards

Frede

Sendt fra Samsung mobil


-------- Oprindelig meddelelse --------
Fra: eliza botto
Dato:13/06/2014 20.48 (GMT+01:00)
Til: r-help@r-project.org
Emne: Re: [R] data format setting

Thanks dennis,
It worked but I had to do some simple modifications to get to the ultimate 
format.
Now I have a list in the following format
$A
2004    2005    2006    2007    2008    2009    2010
..
...
...
..
...

$AY

1967    1968    ....    2000...........

some columns had 365 rows and some 366. those with 365 rows had their 366 row 
as NA.
Now I want to apply approx. command to interpolate 366 values to 365, but when 
I apply approx command I gives out something
which is with $x and $y, and frankly speaking it messed up everything.
Is their a way that i do it neatly without getting the format deteriorated?


In any way, thank-you very much indeed.

Eliza

> Date: Fri, 13 Jun 2014 11:11:37 -0700
> Subject: Re: [R] data format setting
> From: djmu...@gmail.com
> To: eliza_bo...@hotmail.com
>
> Hi:
>
> Maybe something like this:
>
> library(reshape2)
> L <- split(DF, DF$year)
> L2 <- llply(L, function(d) dcast(d, month + day ~ year, value.var =
> "discharge"))
>
> Obviously untested, so caveat emptor. The idea is to use the dcast
> function to reshape the data from long to wide format within year.
>
> Dennis
>
> On Fri, Jun 13, 2014 at 8:55 AM, eliza botto <eliza_bo...@hotmail.com> wrote:
> >
> > Dear R family,
> > I hope you all be doing great. I have a dataset of following format. The 
> > data file is of the following format.
> >
> >       st year month day discharge
> > 1  A     2004     1   1  6.752828
> > 2  A     2004     1   2  7.602053
> > 3  A     2004     1   3  5.583619
> > 4  A     2004     1   4  5.019562
> > 5  A     2004     1   5  4.804489
> > 6  A     2004     1   6  4.363541
> > 7  A     2004     1   7  3.801333
> > 8  A     2004     1   8  3.455991
> > 9  A     2004     1   9  3.402634
> > 10 A     2004     1  10  3.250693
> > ......
> > ......
> > continue
> > ......
> > ......
> >          st year month day discharge
> > 20000    AY 1967    10   3      0.56
> > 20001    AY 1967    10   4      0.56
> > 20002    AY 1967    10   5      0.48
> > 20003    AY 1967    10   6      0.56
> > 20004    AY 1967    10   7      0.48
> > 20005    AY 1967    10   8      0.40
> > 20006    AY 1967    10   9      0.40
> > 20007    AY 1967    10  10      0.56
> > 20008    AY 1967    10  11      0.56
> > 20009    AY 1967    10  12      0.65
> > 20010    AY 1967    10  13      0.85
> >
> > you can see that there are five columns.
> > The first column has the name of the station. I want to split the data 
> > w.r.t the names of the stations. Each station has data for certain years. 
> > for example "A" has data for years from 2004 to 2010 and for "AY" its from 
> > 1967 to 2000.similarly for other years there is data for different number 
> > of years.
> > I want to make a list of matrices each containing the data for that station 
> > in the following format
> > $A
> > 2004    2005    2006    2007    2008    2009    2010
> > ..
> > ...
> > ...
> > ..
> > ...
> >
> > $AY
> >
> > 1967    1968    ....    2000
> >
> > each column should have 365 to 366 values depending on whether there is a 
> > leap year or not. obviously for non-leap years 366th row should be an NA.
> > kindly help me on it.
> > Thankyou very much in advance.
> > Eliza
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to