With the *data.table* package, *R* can use *fread* as follows:

> grab<- function(file)
> {
>  fin<- fread(file=file,
>  sep=NULL,
>  dec=".",
>  quote="", nrows=Inf, header=FALSE,
>  stringsAsFactors=FALSE, verbose=FALSE,
>  col.names=c("record"),
>  check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE,
>  showProgress=TRUE,
>  data.table=FALSE, skip=0,
>  nThread=2, logical01=FALSE, keepLeadingZeros=FALSE)
>  cat(sprintf("Read '%s'.\n", file))
>  #
>  substance<- apply(X=fin, MARGIN=1, FUN=function(r) chartr(",", "\t", r[1]))
>  cat(sprintf("Translated '%s'.\n", file))
>  D<- fread(text=substance,
>  sep="\t",
>  dec=".",
>  quote="", nrows=Inf, header=FALSE,
>  stringsAsFactors=FALSE, verbose=FALSE,
>  col.names=c("ip", "valid.hits", "err.hits", "megabytes"),
>  check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE,
>  showProgress=TRUE,
>  data.table=FALSE, skip=0,
>  nThread=2, logical01=FALSE, keepLeadingZeros=FALSE)
>  cat(sprintf("Parsed '%s'.\n", file))
>  ip<- D$ip
>  withinBlock<- sapply(X=ip, FUN=function(s) as.integer((strsplit(x=s, 
> split=".", fixed=TRUE)[[1]])[4]))
>  D$within.block<- withinBlock
>  return(D)
> }
> 

In short, one pass pulls in all the records into an internal structure, which 
can be edited or manipulated at will, and then a second call to *fread* parses 
it properly. 

*fread *is fast, even for big datasets.


--
Jan Galkowski 

https://www.linkedin.com/in/deepdevelopment

member,

... American Statistical Association
... International Society for Bayesian Analysis
... Ecological Society of America
... International Association of Survey Statisticians
... American Association for the Advancement of Science
... TeX Users Group

(pronouns: *he, him, his*)

*Keep your energy local*. --John Farrell, *ILSR <http://ilsr.org/>*


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to