Hi there, I am having a similar problem with reading in a large text file with around 550.000 observations with each 10 to 100 lines of description. I am trying to parse it in R but I have troubles with the size of the file. It seems like it is slowing down dramatically at some point. I would be happy for any suggestions. Here is my code, which works fine when I am doing a subsample of my dataset.
#Defining datasource file <- "filename.txt" #Creating placeholder for data and assigning column names data <- data.frame(Id=NA) #Starting by case = 0 case <- 0 #Opening a connection to data input <- file(file, "rt") #Going through cases repeat { line <- readLines(input, n=1) if (length(line)==0) break if (length(grep("Id:",line)) != 0) { case <- case + 1 ; data[case,] <-NA split_line <- strsplit(line,"Id:") data[case,1] <- as.numeric(split_line[[1]][2]) } } #Closing connection close(input) #Saving dataframe write.csv(data,'data.csv') Kind regards, Frederik -- View this message in context: http://r.789695.n4.nabble.com/Incremental-ReadLines-tp878581p3447859.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.