With the *data.table* package, *R* can use *fread* as follows: > grab<- function(file) > { > fin<- fread(file=file, > sep=NULL, > dec=".", > quote="", nrows=Inf, header=FALSE, > stringsAsFactors=FALSE, verbose=FALSE, > col.names=c("record"), > check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE, > showProgress=TRUE, > data.table=FALSE, skip=0, > nThread=2, logical01=FALSE, keepLeadingZeros=FALSE) > cat(sprintf("Read '%s'.\n", file)) > # > substance<- apply(X=fin, MARGIN=1, FUN=function(r) chartr(",", "\t", r[1])) > cat(sprintf("Translated '%s'.\n", file)) > D<- fread(text=substance, > sep="\t", > dec=".", > quote="", nrows=Inf, header=FALSE, > stringsAsFactors=FALSE, verbose=FALSE, > col.names=c("ip", "valid.hits", "err.hits", "megabytes"), > check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE, > showProgress=TRUE, > data.table=FALSE, skip=0, > nThread=2, logical01=FALSE, keepLeadingZeros=FALSE) > cat(sprintf("Parsed '%s'.\n", file)) > ip<- D$ip > withinBlock<- sapply(X=ip, FUN=function(s) as.integer((strsplit(x=s, > split=".", fixed=TRUE)[[1]])[4])) > D$within.block<- withinBlock > return(D) > } >
In short, one pass pulls in all the records into an internal structure, which can be edited or manipulated at will, and then a second call to *fread* parses it properly. *fread *is fast, even for big datasets. -- Jan Galkowski https://www.linkedin.com/in/deepdevelopment member, ... American Statistical Association ... International Society for Bayesian Analysis ... Ecological Society of America ... International Association of Survey Statisticians ... American Association for the Advancement of Science ... TeX Users Group (pronouns: *he, him, his*) *Keep your energy local*. --John Farrell, *ILSR <http://ilsr.org/>* [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.