I’ve got an Excel workbook with about 30 worksheets. Each worksheet has 10000 rows of data over 30 columns.
I’d like to read the data from each worksheet into a dataframe or matrix in R for processing. Normally, I use read.csv when interacting with Excel but I’d rather manipulate a multisheet workbook directly than set about splitting the original workbook and saving down each part as a csv. So far, I’ve tried using read.xlsx from the xlsx package. This works fine for small test files – e.g. suppose I’m trying to read from the test_file workbook on my desktop. The following code extracts rows 1 and 2 from worksheet = “johnny”. setwd("C:\\Documents and Settings\\dmenezes\\Desktop") info<- read.xlsx("test_file.xlsx",sheetName="johnny",rowIndex=1:2,header=FALSE) info However, when I try to apply this to my real, large workbook, things go wrong, with the following error message. Any ideas/workarounds? Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.