On Dec 11, 2009, at 11:08 AM, Tom Knockinger wrote:

Hi,
i am new to the R-project but until now i have found solutions for every problem in toturials, R Wikis and this mailing list, but now i have some problems which I can't solve with this knowledge.

I have some data like this:

# sample data
head1 = "a;b;c;d;e;f;g;h;i;k;l;m;n;o"
data1 = "1;1;1;1;1;1;1;1;1;1;1;1;1;1"
data2 = "2;2;2;2;2;2;2;2;2;2;2;2;2;2"
data3 = "3;3;3;3;3;3;3;3;3;3;3;3;3;3"
datastring = paste("", head1,data1,data2,data3,"",sep="\n")

# import operation
res = read.table(textConnection(datastring), header=TRUE, sep = c(";"))
closeAllConnections()

# I use these two lines in a for-loop like this:
#for( j in 1:length(data)) {
#       res[j] = read.table(textConnection(datastring[j]),
header=TRUE, sep = c(";"))
#       closeAllConnections()
#}

I get these strings from a file which contains about 50 to 1000 of them, so I can read them all into a list. I am not sure if there is a better way to do this, but it works for me. Maybe you have some suggestions for a better solution.

Now after this short introduction to the r-program I use, I have two problems with this approach.

1) warnings
i get warnings like "unused connection 3 (datastring) closed" after some other operations from time to time. But all connections should already be closed, and I doesn't create new ones.

2) ram usage and program shutdowns
length(data) is usually between 50 to 1000. So it takes some space in ram (approx 100-200 mb) which is no problem but I use some analysis code which results in about 500-700 mb ram usage, also not a real problem. The results are matrixes of (50x14 to 1000x14) so they are small enough to work with them afterwards: create plots, or make some more analysis. So i wrote a function which do the analysis one file after another and keep only the results in a list. But after some about 2-4 files my R process uses about 1500MB and then the troubles begin.

Windows?

The R console terminates or prints the error that no more space can be allocated. So i have to do each file separate and save each result in a file and restart R after 2 processed files. And do that 3-5 times so that all files are processed, which is a bit anoying.

I did some research on this problem and i find out that
-) after I import the data in the same variable the ram usage goes up each time about 100-200mb instead of reusing or purging the old data, which should be overwritten since they are no longer available after i import a new file. -) the same occures with the analysis functions which uses much more space and also doesn't release the old no longer used variables. But ls() doesn't shows them at all. -) also after I cleared all variables with "rm(list=ls(all=TRUE))" the used ram space is still the same.

So is there a possibility to get the ram space back? So i can do all the analysis in one session and don't have to mess around with additional files?

It is possible to call the garbage collector with gc(). Supposedly that should not be necessary, since garbage collection is automatic, but I have the impression that it helps prevent situations that otherwise lead to virtual memory getting invoked on the Mac (which I also thought should not be happening, but I will swear that it does.)

--
David


Thanks for your help

Tom
--
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02

--
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to