Re: [R] Pulling strings from a Flat file

David Winsemius Tue, 05 Apr 2011 20:00:36 -0700


On Apr 5, 2011, at 7:48 PM, Kalicin, Sarah wrote:

Hi,
I have a flat file that contains a bunch of strings that look likethis. The file was originally in Unix and brought over into Windows:
E123456E234567E345678E456789E567891E678910E. . . .
Basically the string starts with E and is followed with 6 numbers.One string=E123456, length=7 characters. This file contains 10,000'sof these strings. I want to separate them into one vector the lengthof the number of strings in the flat file, where each string is it'son unique value.
cc<-c(7,7,7,7,7,7,7)
aa<- file("Master","r", raw=TRUE)
readChar(aa, cc, useBytes = FALSE)
[1] "E123456" "\nE23456" "7\nE3456" "78\nE456" "789\nE56""7891\nE6" "78910\nE"
close(aa)
unlink("Master")


> txt <- "E123456E234567E345678E456789E567891E678910E"
# You could use readLines to bring in from the file
# and assign to a character vector for work in R.

> gsub("(E[[:digit:]]{6})", "\\1\n", txt)
[1] "E123456\nE234567\nE345678\nE456789\nE567891\nE678910\nE"
# Seems to be "working" properly

> ?scan

> scan(textConnection(gsub("(E[[:digit:]]{6})", "\\1\n", txt)),what="character")

Read 7 items
[1] "E123456" "E234567" "E345678" "E456789" "E567891" "E678910" "E"

You might be able to use read.table or variants.

The biggest issue is I am getting \n added into the string, which Iam not sure where it is coming from, and splices the strings. Anysuggestions on getting rid of the /n and create an infinite sequenceof 7's for the string length for the cc vector? Is there a betterway to do this?
Sarah


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pulling strings from a Flat file

Reply via email to