Hi,

I am a total newbie to R so I apologize if the answer to my question is too
obvious.  I a data set of the following form:

 

  

        Date
        V1
        V...
        VN
        Region
        Industry

  

        22/03/1995 23:01:12
        1
        3
        2
        15
        A
  
 

        21/03/1995 21:01:12
        3
        3
        1
        9
        C
  
 

        1/04/1995 17:01:06
        3
        2
        1
        3
        B
  
 

Now I would like to analyze the data in the data.frame by Region, Industry,
Date (I would like to collapse the whole think to weekly data) and by the
three different answering options {1,2,3} in V1...VN. In stata which I used
before i did this step by step with a loop over all questions (V1...VN):
egen pos_`X'=total(`X'==1), by(industry week_year); egen
pos_`X'=total(`X'==2, by(industry week_year). This step-by-step procedure
works because stata, even if the dates are displayed as weeks, doesn't
aggregate the values immediately. Unfortunately there seems to be no command
which works exactly in the same manner as by() (from stata) in R. My by now
most successful attempt accomplish the above described task was by using: 

as.data.frame(tapply(euwifo[,1]=1, list(df$date, df$region, df$industry),
mean))

(where date is formatted as ISO-weekly %U)
Of course I would have to loop this over all questions (20) and all
answering possibilities (3) but at least it gives me an out put of the
structure:

 

  

         . 
        industry.region
        Industry.region
        industry.region
        industry.region

  

         10-1995
        32
        45
        10
        9
  
 

         15-1995
        2
        47
        5
        6
  
 

I could live with that because I could recombine the so created different
dataframes thenafter. My problem however is tapply doesn't preserve the
dataframe's format as a time series (xts). This means R aggregates by time
(week) (and industry and region) but the weeks on the x-axis are not in the
right order. I also tried to apply.weekly() but this doesn't seem to do what
I want to do.

Could anyone give me a hint how i could to this? Maybe with formatting the
data frame as time series data beforehand with preserving this during that
procedure. And maybe somebody also has an idea how I can maybe avoid all
this looping.

I would appreciate it very much much if somebody of you could give me a
hint!

Best regards,

Andreas 


 

--
View this message in context: 
http://r.789695.n4.nabble.com/Splitting-up-large-set-of-survey-data-into-categories-tp4323327p4323327.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to