Hi, I have some very large (~1.1 GB) output files from a groundwater model called STOMP that I want to read as efficiently as possible. For each variable there are over 1 million values to read. Variables are not organized in columns; instead they are written out in sections in the file, like this:
X-Direction Node Positions, m 5.931450000E+05 5.931550000E+05 5.931650000E+05 5.931750000E+05 5.932450000E+05 5.932550000E+05 5.932650000E+05 5.932750000E+05 . . . 5.946950000E+05 5.947050000E+05 5.947150000E+05 5.947250000E+05 5.947950000E+05 5.948050000E+05 5.948150000E+05 5.948250000E+05 Y-Direction Node Positions, m 1.148050000E+05 1.148050000E+05 1.148050000E+05 1.148050000E+05 1.148050000E+05 1.148050000E+05 1.148050000E+05 1.148050000E+05 . . . 1.171950000E+05 1.171950000E+05 1.171950000E+05 1.171950000E+05 1.171950000E+05 1.171950000E+05 1.171950000E+05 1.171950000E+05 Z-Direction Node Positions, m 9.550000000E+01 9.550000000E+01 9.550000000E+01 9.550000000E+01 9.550000000E+01 9.550000000E+01 9.550000000E+01 9.550000000E+01 . . . I want to read and use only a subset of the variables. I wrote the function below to find the line where each target variable begins and then scan the values, but it still seems rather slow, perhaps because I am opening and closing the file for each variable. Can anyone suggest a faster way? # Reads original STOMP plot file (plot.*) directly. Should be useful when the plot files are # very large with lots of variables, and you just want to retrieve a few of them. # Arguments: 1) plot filename, 2) number of nodes, # 3) character vector of names of target variables you want to return. # Returns a list with the selected plot output. READ.PLOT.OUTPUT6 <- function(plt.file, num.nodes, var.names) { lines <- readLines(plt.file) num.vars <- length(var.names) tmp <- list() for(i in 1:num.vars) { ind <- grep(var.names[i], lines, fixed=T, useBytes=T) if(length(ind) != 1) stop("Not one line in the plot file with matching variable name.\n") tmp[[i]] <- scan(plt.file, skip=ind, nmax=num.nodes, quiet=T) } return(tmp) } # end READ.PLOT.OUTPUT6() Regards, Scott Waichler Pacific Northwest National Laboratory Richland, WA, USA scott.waich...@pnnl.gov ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.