Thanks to Gabor Grothendieck and Dennis Murphy I can now solve first part of my problem and already impress my colleagues with the R-program below (I know it could be written in a smarter way, but I am learning). It reads my partly comma separated partly underscore separated string and cleans it up in a very need way.
Regardless of my inability to write tight code I moved on to the second part of my quest, to put it all in to a loop to be able to loop over my approximately 100 .txt files in /usr2/username/data/ I got started with list.files() and my loop is more or less working, but I got stuck on the last cbind part. Is there a friendly R-hacker out there that would be willing to take a look at my loop below*2? Thanks, Eric ########################################### ## ## ## The answer to the first part of my question ## ## ## ########################################### Line <- readLines(file("/usr2/efail/data/example.txt")) s <- strsplit(Line, "ZZ_")[[1]] s2 <- sub("BLOCK.*", "BLOCK", s) s3 <- sub("@9z.svg", "", s2) s4 <- gsub("_", ",", s3) s5 <- read.table(textConnection(s4[1]), sep = ",") DF <- read.table(textConnection(s4), skip = 1, sep = ",", as.is = TRUE) DF$block <- head(cumsum(c("", DF$V8) == "BLOCK")+1, -1) DF$run <- ave(DF$block, DF$block, FUN = seq_along) DF$V8 <- NULL names(DF) <- c("IngNam", "Tx", "Ty", "Treatment", "x", "y", "Y", "BLOCK", "RUN") DF$ID <- s5$V1 DF ##################################### ## ## ## The PARTLY WORKING loop ## ## ## ##################################### fname <- list.files("/usr2/efail/data",pattern=".txt", full.names = TRUE, recursive =TRUE, ignore.case = TRUE) for (sp in 1:length(fname)) { Line <- readLines(file(fname[sp])) s <- strsplit(Line, "ZZ_")[[1]] s2 <- sub("BLOCK.*", "BLOCK", s) s3 <- sub("@9z.svg", "", s2) s4 <- gsub("_", ",", s3) s5 <- read.table(textConnection(s4[1]), sep = ",") DF <- read.table(textConnection(s4), skip = 1, sep = ",", as.is = TRUE) DF$block <- head(cumsum(c("", DF$V8) == "BLOCK")+1, -1) DF$run <- ave(DF$block, DF$block, FUN = seq_along) DF$V8 <- NULL names(DF) <- c("IngNam", "Tx", "Ty", "Treatment", "x", "y", "Y", "BLOCK", "RUN") DF$ID <- s5$V1 FINAL.DF <- cbind(DF… ## This is where I got stuck. } On Mon, Mar 7, 2011 at 8:18 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Sun, Mar 6, 2011 at 10:13 PM, Eric Fail <eric.f...@gmx.com> wrote: >> Dear R-list, >> >> I have a partly comma separated partly underscore separated string that I am >> trying to parse into R. >> >> Furthermore I have a bunch of them, and they are quite long. I have now >> spent most of my Sunday trying to figure this out and thought I would try >> the list to see if someone here would be able to get me started. >> >> My data structure looks like this, >> >> (in a example.txt file) >> Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by 960 >> pixels, On Device M, M, >> 3.2.4,zz_373_462_488_...@9z.svg,592,820,3.35,zz_032_288_436_...@9z.svg,332,878,3.66,zz_384_204_433_...@9z.svg,334,824,3.28,zz_365_575_683_...@9z.svg,598,878,3.50,zz_005_480_239_...@9z.svg,630,856,8.03,zz_030_423_394_...@9z.svg,98,846,4.09,zz_033_596_398_...@9z.svg,636,902,3.28,zz_263_064_320_...@9z.svg,570,894,1.26,bl...@9z.svg,322,842,32.96,zz_004_088_403_...@9z.svg,606,908,3.32,zz_703_546_434_...@9z.svg,624,934,2.58,zz_712_348_543_...@9z.svg,20,828,5.36,zz_005_48_239_...@9z.svg,580,830,4.36,zz_310_444_623_...@9z.svg,586,806,0.08,zz_030_423_394_...@9z.svg,350,854,3.84,zz_340_382_539_...@9z.svg,570,894,1.26,bl...@9z.svg,542,840,4.44,zz_345_230_662_...@9z.svg,632,844,2.47,zz_006_335_309_...@9z.svg,96,930,3.63,zz_782_346_746_...@9z.svg,306,850,2.58,zz_334_200_333_...@9z.svg,304,842,3.34,zz_383_506_726_...@9z.svg,622,884,3.84,zz_294_360_448_...@9z.svg,90,858,3.56,zz_334_335_473_...@9z.svg,570,894,1.26,bl...@9z.svg,320,852,4.04, >> (end of example.txt file) >> >> The above is approximate 5% of the length of a full file, and then I got >> about 100 of them. Please note that the strings end with a comma. >> >> I am trying to parse it into something like this >> >> ID ImgNam BLOCK RUN Tx Ty Treatment x y Y >> Subject ID 373 1 1 462 488 TRT 592 820 3.35 >> Subject ID 32 1 2 288 436 CON 332 878 3.66 >> Subject ID 384 1 3 204 433 TRT 334 824 3.28 >> Subject ID 365 1 4 575 683 TRT 598 878 3.5 >> Subject ID 5 1 5 480 239 CON 630 856 8.03 >> Subject ID 30 1 6 423 394 CON 98 846 4.09 >> Subject ID 33 1 7 596 398 CON 636 902 3.28 >> Subject ID 263 1 8 64 320 TRT 570 894 1.26 >> Subject ID 4 2 1 88 403 CON 606 908 3.32 >> Subject ID 703 2 2 546 434 CON 624 934 2.58 >> Subject ID 712 2 3 348 543 CON 20 828 5.36 >> Subject ID 5 2 4 48 239 CON 580 830 4.36 >> Subject ID 310 2 5 444 623 TRT 586 806 0.08 >> Subject ID 30 2 6 423 394 CON 350 854 3.84 >> Subject ID 340 2 7 382 539 TRT 570 894 1.26 >> Subject ID 345 3 1 230 662 TRT 632 844 2.47 >> Subject ID 6 3 2 335 309 CON 96 930 3.63 >> Subject ID 782 3 3 346 746 TRT 306 850 2.58 >> Subject ID 334 3 4 200 333 TRT 304 842 3.34 >> Subject ID 383 3 5 506 726 TRT 622 884 3.84 >> Subject ID 294 3 6 360 448 TRT 90 858 3.56 >> Subject ID 334 3 7 335 473 TRT 570 894 1.26 >> >> I could do it in Excel, but it would take me a week--and it would be >> stupid--if someone could please help me get started I would very much >> appreciate it. It would not only benefit me, but my colleagues would see the >> benefit of R and the R-list in particular. >> > > Try this. We split the line by ZZ_ giving s and remove the junk after > the word BLOCK giving s2. Then we remove @9z.svg giving s3 and > convert each _ to , giving s4. We then read it into a data frame > using comma as the separator, calculate the block and run columns, > remove one junk column and assign column names. > >> Line <- "Subject ID,ExperimentName,2010-04-23,32:34:23,Version 0.4, 640 by >> 960 pixels, On Device M, M, >> 3.2.4,zz_373_462_488_...@9z.svg,592,820,3.35,zz_032_288_436_...@9z.svg,332,878,3.66,zz_384_204_433_...@9z.svg,334,824,3.28,zz_365_575_683_...@9z.svg,598,878,3.50,zz_005_480_239_...@9z.svg,630,856,8.03,zz_030_423_394_...@9z.svg,98,846,4.09,zz_033_596_398_...@9z.svg,636,902,3.28,zz_263_064_320_...@9z.svg,570,894,1.26,bl...@9z.svg,322,842,32.96,zz_004_088_403_...@9z.svg,606,908,3.32,zz_703_546_434_...@9z.svg,624,934,2.58,zz_712_348_543_...@9z.svg,20,828,5.36,zz_005_48_239_...@9z.svg,580,830,4.36,zz_310_444_623_...@9z.svg,586,806,0.08,zz_030_423_394_...@9z.svg,350,854,3.84,zz_340_382_539_...@9z.svg,570,894,1.26,bl...@9z.svg,542,840,4.44,zz_345_230_662_...@9z.svg,632,844,2.47,zz_006_335_309_...@9z.svg,96,930,3.63,zz_782_346_746_...@9z.svg,306,850,2.58,zz_334_200_333_...@9z.svg,304,842,3.34,zz_383_506_726_...@9z.svg,622,884,3.84,zz_294_360_448_...@9z.svg,90,858,3.56,zz_334_335_473_...@9z.svg,570,894,1.26,bl...@9z.svg,320,852,4.04," >> >> s <- strsplit(Line, "ZZ_")[[1]] >> s2 <- sub("BLOCK.*", "BLOCK", s) >> s3 <- sub("@9z.svg", "", s2) >> s4 <- gsub("_", ",", s3) >> DF <- read.table(textConnection(s4), skip = 1, sep = ",", as.is = TRUE) >> DF$block <- head(cumsum(c("", DF$V8) == "BLOCK")+1, -1) >> DF$run <- ave(DF$block, DF$block, FUN = seq_along) >> DF$V8 <- NULL >> names(DF) <- c("IngNam", "Tx", "Ty", "Treatment", "x", "y", "Y", "BLOCK", >> "RUN") >> DF > IngNam Tx Ty Treatment x y Y BLOCK RUN > 1 373 462 488 TRT 592 820 3.35 1 1 > 2 32 288 436 CON 332 878 3.66 1 2 > 3 384 204 433 TRT 334 824 3.28 1 3 > 4 365 575 683 TRT 598 878 3.50 1 4 > 5 5 480 239 CON 630 856 8.03 1 5 > 6 30 423 394 CON 98 846 4.09 1 6 > 7 33 596 398 CON 636 902 3.28 1 7 > 8 263 64 320 TRT 570 894 1.26 1 8 > 9 4 88 403 CON 606 908 3.32 2 1 > 10 703 546 434 CON 624 934 2.58 2 2 > 11 712 348 543 CON 20 828 5.36 2 3 > 12 5 48 239 CON 580 830 4.36 2 4 > 13 310 444 623 TRT 586 806 0.08 2 5 > 14 30 423 394 CON 350 854 3.84 2 6 > 15 340 382 539 TRT 570 894 1.26 2 7 > 16 345 230 662 TRT 632 844 2.47 3 1 > 17 6 335 309 CON 96 930 3.63 3 2 > 18 782 346 746 TRT 306 850 2.58 3 3 > 19 334 200 333 TRT 304 842 3.34 3 4 > 20 383 506 726 TRT 622 884 3.84 3 5 > 21 294 360 448 TRT 90 858 3.56 3 6 > 22 334 335 473 TRT 570 894 1.26 3 7 > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.