[R] Automate a data load and merge

Jon Loehrke Fri, 12 Jun 2009 06:30:04 -0700

Hi R list,
        I would like to automate, or speed up the process from which I take  
several separate datasets, stored in .csv formate, import and merge  
them by a common variable.  So far I have greatly sped up the loading  
process but cannot think of a way to automate the merger of all  
datasets into a common data.frame.
        My apologies if this has been covered, any R search suggestions are  
appreciated.


# All scripts function out of the base directory
rm(list=ls())
setwd('/Users/myuser/Documents/workfolder/')

# Check files and list all .csv in directory
files<-list.files()
files<-files[grep('.csv', files)]
# Create labels for each file (ex. June08.csv becomes June08)
labels<-gsub('.csv', '', files)

# Load all .csv datasets and assign name

item<-vector() # preallocate an index of all items in datasets
for(i in 1:length(files)){
        X<-read.csv(files[i])
        item<-union(item, X$Item_Name)
        assign(labels[i], X)            
        }
# What is loaded
ls()
# [1] "files"    "i"        "item"     "June01" "June02" "June03"   
"labels"

# What does everything look like?
str(June03)
#'data.frame':  992 obs. of  8 variables:
# $ Item_Name        : Factor w/ 992 levels "Birds","Fish",..: 1 2 3 4  
5 6 7 8 9 10 ...
# $ Occurance     : int  30 30 50 450 75 550 100 500 250 75 ...

str(June01)
#'data.frame':  819 obs. of  8 variables:
# $ Item_Name        : Factor w/ 819 levels "Birds","Turtles",..: 1 2  
3 4 5 6 7 8 9 10 ...
# $ Occurance     : int  30 50 450 750 550 100 500 250 275 450 ...

# Here is where I'm stuck...
#I would like to:
#       Create a data.frame with an index column composed of the union of  
all items
#       Create columns in the frame by a merger of the 'Occurance' in each  
loaded dataset and are labeled by their name (eg. June01)
#       Automate this procedure so that I do not have to manuualy type in  
each column addition when I have a new dataset.
        
# This is my current strategy, but when I have new datasets I have to  
mannually setup the preallocation and merger

allData<-data.frame(Item=item, June01 =NA, June02=NA,  June03 =NA)
allData[match(June01$Item_Name, allData$Item ),]$June01 <-  
June01$Occurance
allData[match(June02$Item_Name, allData$Item ),]$June02 <-  
June02$Occurance
allData[match(June03$Item_Name, allData$Item ),]$June03 <-  
June03$Occurance

# Any help to automate this process is greatly appreciated!!!

sessionInfo()
#R version 2.9.0 (2009-04-17)
#i386-apple-darwin8.11.1
#
#locale:
#en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  methods   base


Jon Loehrke
Graduate Research Assistant
Department of Fisheries Oceanography
School for Marine Science and Technology
University of Massachusetts
200 Mill Road, Suite 325
Fairhaven, MA 02719
jloeh...@umassd.edu
T 508-910-6393
F 508-910-6396


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Automate a data load and merge

Reply via email to