Emmanuel,
  Thanks for your reply.  Please allow me to clarify.  I am already extensively 
using a RDBMS and to store the data, and have used SQL and ODBC to extract the 
data into a set of R-files.  (I have experimented with this a bit, and for my 
specific application, storing the data in R seems to improve speed and  
convenience.  For example, I can extract the data only once, store it as an 
R-file, and then use the data an infinite number of times, whiteout ever again 
needing to "hit" the RDBMS.)
   
  What I am trying to do:  I need to perform certain 
operations/processes/custom-functions on each "sample".  I can easily write the 
code to do this, using a "FOR-loop".  But I will then need to have a separate 
loop for each process I want to run, and will re-write much of the code within 
the "FOR-loop".
   
  I have many different "processes" I might want to perform on each sample on 
any given day.  So instead of always re-writing the same loop, I want to write 
a function that takes as its input the "process", and then goes and runs it on 
each sample.
   
  Thanks
   
   
  From: Emmanuel Charpentier <charpent_at_bacbuc.dyndns.org> 
Date: Fri, 07 Dec 2007 00:00:21 +0100
    Thomas Pujol a écrit : 
> R-help users, 
> Thanks in advance for any assistance ... I truly appreciate your expertise. I 
> searched help and could not figure this out, and think you can probably offer 
> some helpful tips. I apologize if I missed something, which I'm sure I 
> probably did. 
> 
> I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) 
> 
> For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, 
> gdp.1952, etc.) 
> 
> (Because the data is rather "large" (and for other reasons), I have chosen to 
> store the data as individual files, as opposed to a list of data frames.) 
> 
> I wish to write a function that enables me to "run" any of many custom 
> "functions/processes" on each sample of data. 
> 
> I currently accomplish this by using a custom function that uses: 
> "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample 
> number" for each line of text I submit to "eval(parse(t=text.i2))" ). 
> 
> Is there a better/preferred/more flexible way to do this? 
  Beware : what follows is the advice of someone used to use RDBMS and SQL to 
work with data ; as anyone should know, everything is a nail to a man with a 
hammer. Caveat emptor...   Unless I misunderstand you, you are trying to treat 
piecewise a large dataset made of a large number of reasonably-sized 
independent chunks.   What you're trying to do seems to me a bit reinventing 
SAS macro language. What's the point ?   IMNSHO, "large" datasets that are used 
only piecewise are much better handled in a real database (RDBMS), queried at 
runtime via, for example, Brian Ripley's RODBC.   In your example, I'd create a 
table births with all your data + the relevant year. Out of the top of my mind 
:   # Do that ONCE in the lifetime of your data : a RDBMS is probably more # 
apt than R dataframes for this kind of management   library(RODBC) 
channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS)   
sqlSave(channel, tablename="Births", 
        rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))),               
       births.1952),                
cbind(data.frame(Year=rep(1953,nrow(births.1953))),                      
births.1953),  
  
# ... ^W^Y ad nauseam ... 
)) 
  rm(births.1951, births.1952, ...) # get back breathing space   Beware : 
certain data types may be tricky to save ! I got bitten by Dates recently... 
See RODBC documentation, your DBMS documentation and the "R Data Import/Export 
guide"...   At analysis time, you may use the result of the relevant query 
exactly as one of your dataframes. instead of : 
foo(... data=birth.1952, ...) 
type : 
foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", 
...) # Syntax illustrating talking to a "picky" DBMS...   Furthermore, the 
variable "Year" bears your "d" information. Problem (dis)solved.   You may loop 
(or even sapply()...) at will on d : for(year in 1952:1978) { 
  query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year)   foo(... 
data=sqlQuery(channel,query), ...)   ... 
}   If you already use a DBMS with some connection to R (via RODBC or 
otherwise), use that. If not, sqlite is a very lightweight library that enables 
you to use a (very considerable) subset of SQL92 to manipulate your data.   I 
understand that some people of this list have undertaken the creation of a 
sqlite-based package dedicated to this kind of large data management.   HTH,    
                                     Emmanuel Charpentier 
   

















    
---------------------------------


       
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to