Thomas Pujol a écrit : > R-help users, > Thanks in advance for any assistance ... I truly appreciate your expertise. > I searched help and could not figure this out, and think you can probably > offer some helpful tips. I apologize if I missed something, which I'm sure I > probably did. > > I have data for many "samples". (e.g. 1950, 1951, 1952, etc.) > > For each "sample", I have many data-frames. (e.g. temp.1952, births.1952, > gdp.1952, etc.) > > (Because the data is rather "large" (and for other reasons), I have chosen > to store the data as individual files, as opposed to a list of data frames.) > > I wish to write a function that enables me to "run" any of many custom > "functions/processes" on each sample of data. > > I currently accomplish this by using a custom function that uses: > "eval(parse(t=text.i2)) ", and "gsub(pat, rep, x)" (this changes the "sample > number" for each line of text I submit to "eval(parse(t=text.i2))" ). > > Is there a better/preferred/more flexible way to do this?
Beware : what follows is the advice of someone used to use RDBMS and SQL to work with data ; as anyone should know, everything is a nail to a man with a hammer. Caveat emptor... Unless I misunderstand you, you are trying to treat piecewise a large dataset made of a large number of reasonably-sized independent chunks. What you're trying to do seems to me a bit reinventing SAS macro language. What's the point ? IMNSHO, "large" datasets that are used only piecewise are much better handled in a real database (RDBMS), queried at runtime via, for example, Brian Ripley's RODBC. In your example, I'd create a table births with all your data + the relevant year. Out of the top of my mind : # Do that ONCE in the lifetime of your data : a RDBMS is probably more # apt than R dataframes for this kind of management library(RODBC) channel<-odbcConnect(WhateverYouHaveToUseForYourFavoriteDBMS) sqlSave(channel, tablename="Births", rbind(cbind(data.frame(Year=rep(1952,nrow(births.1952))), births.1952), cbind(data.frame(Year=rep(1953,nrow(births.1953))), births.1953), # ... ^W^Y ad nauseam ... )) rm(births.1951, births.1952, ...) # get back breathing space Beware : certain data types may be tricky to save ! I got bitten by Dates recently... See RODBC documentation, your DBMS documentation and the "R Data Import/Export guide"... At analysis time, you may use the result of the relevant query exactly as one of your dataframes. instead of : foo(... data=birth.1952, ...) type : foo(... data=sqlQuery(channel,"select * from \"Births\" where \"Year\"=1952;", ...) # Syntax illustrating talking to a "picky" DBMS... Furthermore, the variable "Year" bears your "d" information. Problem (dis)solved. You may loop (or even sapply()...) at will on d : for(year in 1952:1978) { query<-sprintf("select * from \"Births\" where \"Year\"=%d;",year) foo(... data=sqlQuery(channel,query), ...) ... } If you already use a DBMS with some connection to R (via RODBC or otherwise), use that. If not, sqlite is a very lightweight library that enables you to use a (very considerable) subset of SQL92 to manipulate your data. I understand that some people of this list have undertaken the creation of a sqlite-based package dedicated to this kind of large data management. HTH, Emmanuel Charpentier ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.