You might want to do 'object.size' on myData to see how big it is and then if you do try to run reshape again take a look and see if there is any paging happening on your system which may be an indication that you don't have enough memory. Also with 53M observations, it may take a lot of time to determine how to do the reshape.
You can also approach the problem in parts. Take 10K observations and see how long it takes and how much memory is used; then 100K, then 1M. This may give you an idea of the growth in both time and memory. When you have something really big, it is a good idea to start with a subset and see what resources are used. This will give you an idea of how much it will take for the complete set. When you do the runs, report back on the memory and CPU time required. On Mon, Jul 12, 2010 at 9:19 AM, Juliet Hannah <juliet.han...@gmail.com> wrote: > Hi Jim, > > Thanks for responding. Here is the info I should have included before. > I should be able to access 4 GB. > >> str(myData) > 'data.frame': 53860857 obs. of 4 variables: > $ V1: chr "200003" "200006" "200047" "200050" ... > $ V2: chr "cv0001" "cv0001" "cv0001" "cv0001" ... > $ V3: chr "A" "A" "A" "B" ... > $ V4: chr "B" "B" "A" "B" ... >> sessionInfo() > R version 2.11.0 (2010-04-22) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > On Mon, Jul 12, 2010 at 7:54 AM, jim holtman <jholt...@gmail.com> wrote: >> What is the configuration you are running on (OS, memory, etc.)? What >> does your object consist of? Is it numeric, factors, etc.? Provide a >> 'str' of it. If it is numeric, then the size of the object is >> probably about 1.8GB. Doing the long to wide you will probably need >> at least that much additional memory to hold the copy, if not more. >> This would be impossible on a 32-bit version of R. >> >> On Mon, Jul 12, 2010 at 1:25 AM, Juliet Hannah <juliet.han...@gmail.com> >> wrote: >>> I have a data set that has 4 columns and 53860858 rows. I was able to >>> read this into R with: >>> >>> cc <- rep("character",4) >>> myData <- >>> read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",") >>> >>> >>> I need to reshape this data from long to wide. On a small data set the >>> following lines work. But on the real data set, it didn't finish even >>> when I took a sample of two (rows in new data). I didn't receive an >>> error. I just stopped it because it was taking too long. Any >>> suggestions for improvements? Thanks. >>> >>> # start example >>> # i have commented out the write.table statement below >>> >>> testData <- read.table(textConnection("rs9999853,cv0084,A,A >>> rs999986,cv0084,C,B >>> rs9999883,cv0084,E,F >>> rs9999853,cv0085,G,H >>> rs999986,cv0085,I,J >>> rs9999883,cv0085,K,L"),header=FALSE,sep=",") >>> closeAllConnections() >>> >>> mysamples <- unique(testData$V2) >>> >>> for (one_ind in mysamples) { >>> one_sample <- testData[testData$V2==one_ind,] >>> mywide <- reshape(one_sample, timevar = "V1", idvar = >>> "V2",direction = "wide") >>> # write.table(mywide,file >>> ="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE) >>> } >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.