dear all;
I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL on similar issue before. I tried to modify Jim`s code to the real data but it did not work. Now I am posting first two rows the imitation of real data using dput() format (please see at the bottom). I have two data sets, data=map and data=ref. The first to rows of each data set are given below. Data map has more than 27 million and data ref has about 560 rows. Basically I need run two different tasks. My R codes for these task are given below but they do not work properly. I sincerely do appreciate your helps. Regards, Greg Task 1) For example, the first and second columns for row 1 in data ref are chr1, 6457839 and 6638389. So I need write an R code normally first look the first row in ref (which they are chre1 6457839 and 6638389) than summing the column of "map$post_prob" and give the number of map$snp falls between 6457839 and 6638389 that their cumulative sum is >0.85. Then do the same for the second, third....in ref. At the end I would like a table gave below (need_ouput). Please notice the all value specified info in ref data file are exist in map$CHR and map$POS columns. Task2) Again example, the first and second columns for row 1 in data ref are chr1, 6457839 and 6638389. So I need that R gives me the minimum map$p for the 2 chr1, 6457839 and 6638389 (as there are many snps between these regions and would like choose the smallest one in those regions. Than do the same for the second, third....rows in ref. Then put the results of Task1 and Task2 into need_ouput file #R codes modified from Jim map2<-map[order(map$CHR, map$POS, -map$post_prob),] # get a field for the counts ref$n<-NA # and a field for the minimum p values ref$min_p<-NA # get the number of rows in "ref" nref<-dim(ref)[1] for(i in 1:nref) { CHR<- which(map2$CHR==ref$CHR[i]) POS_start<-which(map2$POS==ref$POS_start[i]) POS_end<-which(map2$POS==ref$POS_end[i]) cat("CHR", "CHR"," POS_start",POS_start,"POS_end",POS_end,"\n") # get the range of matches POSrange<-range(c(CHR,POS_start,POS_end)) # convert this to a sequence spanning all matches allPOS<-POSrange[1]:POSrange[2] ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99) ref$min_p[i]<-min(map2$p[allPOS]) } dput(map) structure(list(CHR = structure(c(1L, 1L), .Label = "chr1", class = "factor"), snp = structure(1:2, .Label = c("rs4747841", "rs4749917"), class = "factor"), Allel1 = structure(1:2, .Label = c("A", "T"), class = "factor"), Allel2 = structure(c(2L, 1L), .Label = c("C", "G"), class = "factor"), fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029, 0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob = c(1.248817e-158, 1.248817e-158)), .Names = c("CHR", "snp", "Allel1", "Allel2", "fr", "effec", "SE", "p", "POS", "post_prob"), class = "data.frame", row.names = c(NA, -2L)) dput(ref) structure(list(CHR = structure(1:2, .Label = c("chr10", "chr14" ), class = "factor"), POS_start = c(6457839L, 21005246L), POS_end = c(6638389L, 21550658L)), .Names = c("CHR", "POS_start", "POS_end"), class = "data.frame", row.names = c(NA, -2L)) dput(need_output) structure(list(CHR = structure(1:2, .Label = c("chr1", "chr22" ), class = "factor"), POS = c(312127953L, 46487552L), POS_start = c(32036927L, 45766451L), POS_end = c(3232240262, 46801601), snp = structure(1:2, .Label = c("rs1143427", "rs55958907"), class = "factor"), alle1l = structure(1:2, .Label = c("G", "T"), class = "factor"), allel2 = structure(1:2, .Label = c("A", "G"), class = "factor"), fr = c(0.278, 0.974), effec = c(0.6, 0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob = c(0.229, 0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS", "POS_start", "POS_end", "snp", "alle1l", "allel2", "fr", "effec", "SE", "P", "post_prob", "n"), class = "data.frame", row.names = c(NA, -2L )) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.