Hi Bert; I do appreciate for this. I need check your codes on task2 tomorrow at my office on the real data as I have difficulty (because a technical issue) to remote connection. I am sure it will work well.
I am sorry that I was not able to explain my first question. Basically Values in ref data represent the region of chromosome. I need choose these regions in map (all regions values in ref data are exist in map data in the first column -column map$reg). And then summing up the column "map$rate and count the numbers that gives >0.85. For example, consider the first row in data ref. They are 29220 and 63933. After sorting the first column in map then summing column "map$rate" only between 29220 to 63933 in sorted map and cut off at >0.85. Then count how many rows in sorted map gives >0.85. For example consider there are 38 rows between 29220 in 63933 in >sorted map$reg and only summing first 12 of them gives>0.85. Then my answer is going to be 12 for 29220 - 63933 in ref. Thanks I lot for your patience. Cheers, Greg On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hho...@gmail.com> wrote: > Hi Bert; > > I do appreciate for this. I need check your codes on task2 tomorrow at my > office on the real data as I have difficulty (because a technical issue) to > remote connection. I am sure it will work well. > > I am sorry that I was not able to explain my first question. Basically > > Values in ref data represent the region of chromosome. I need choose these > regions in map (all regions values in ref data are exist in map data in the > first column -column map$reg). And then summing up the column "map$rate and > count the numbers that gives >0.85. For example, consider the first row in > data ref. They are 29220 and 63933. After sorting the first column in > map then summing column "map$rate" only between 29220 to 63933 in > sorted map and cut off at >0.85. Then count how many rows in sorted map > gives >0.85. For example consider there are 38 rows between 29220 in > 63933 in sorted map$reg and only summing first 12 of them gives>0.85. > Then my answer is going to be 12 for 29220 - 63933 in ref. > > Thanks I lot for your patience. > > Cheers, > Greg > > On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4...@gmail.com> > wrote: > >> Greg: >> >> I was not able to understand your task 1. Perhaps others can. >> >> My understanding of your task 2 is that for each row of ref, you wish >> to find all rows,of map such that the reg values in those rows fall >> between the reg1 and reg2 values in ref (inclusive change <= to < if >> you don't want the endpoints), and then you want the minimum map$p >> values of all those rows. If that is correct, I believe this will do >> it (but caution, untested, as you failed to provide data in a >> convenient form, e.g. using dput() ) >> >> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i) >> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0)) >> >> >> If my understanding is incorrect, please ignore both the above and the >> following: >> >> >> The "solution" I have given above seems inefficient, so others may be >> able to significantly improve it if you find that it takes too long. >> OTOH, my understanding of your specification is that you need to >> search for all rows in map data frame that meet the criterion for each >> row of ref, and without further information, I don't know how to do >> this without just repeating the search 560 times. >> >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hho...@gmail.com> wrote: >> > Dear all; >> > >> > >> > >> > I have two data sets, data=map and data=ref). A small part of each data >> set >> > are given below. Data map has more than 27 million and data ref has >> about >> > 560 rows. Basically I need run two different task. My R codes for these >> > task are given below but they do not work properly. >> > >> > I sincerely do appreciate your helps. >> > >> > >> > Regards, >> > >> > Greg >> > >> > >> > >> > Task 1) >> > >> > For example, the first and second columns for row 1 in data ref are >> 29220 >> > 63933. So I need write an R code normally first look the first row in >> ref >> > (which they are 29220 and 63933) than summing the column of "map$rate" >> and >> > give the number of rows that >0.85. Then do the same for the second, >> > third....in ref. At the end I would like a table gave below (the >> results I >> > need). Please notice the all value specified in ref data file are exist >> in >> > map$reg column. >> > >> > >> > >> > Task2) >> > >> > Again example, the first and second columns for row 1 in data ref are >> 29220 >> > 63933. So I need write an R code give the minimum map$p for the 29220 >> > -63933 intervals in map file. Than >> > >> > do the same for the second, third....in ref. >> > >> > >> > >> > >> > #my attempt for the first question >> > >> > temp<-map[order(map$reg, map$p),] >> > >> > count<-1 >> > >> > temp<-unique(temp$reg >> > >> > for(i in 1:length(ref) { >> > >> > for(j in 1:length(ref) >> > >> > { >> > >> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,] >> > & temp[cumsum(temp$rate) >> >>0.70,]) >> > >> > count=count+1 >> > >> > } >> > >> > } >> > >> > #my attempt for the second question >> > >> > >> > >> > temp<-map[order(map$reg, map$p),] >> > >> > count<-1 >> > >> > temp<-unique(temp$reg >> > >> > for(i in 1:length(ref) { >> > >> > for(j in 1:length(ref) >> > >> > { >> > >> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]) >> > >> > output<-temp2[temp2$p==min(temp2$p),] >> > >> > } >> > >> > } >> > >> > >> > >> > Data sets >> > >> > >> > Data= map >> > >> > reg p rate >> > >> > 10276 0.700 3.867e-18 >> > >> > 71608 0.830 4.542e-16 >> > >> > 29220 0.430 1.948e-15 >> > >> > 99542 0.220 1.084e-15 >> > >> > 26441 0.880 9.675e-14 >> > >> > 95082 0.090 7.349e-13 >> > >> > 36169 0.480 9.715e-13 >> > >> > 55572 0.500 9.071e-12 >> > >> > 65255 0.300 1.688e-11 >> > >> > 51960 0.970 1.163e-10 >> > >> > 55652 0.388 3.750e-10 >> > >> > 63933 0.250 9.128e-10 >> > >> > 35170 0.720 7.355e-09 >> > >> > 06491 0.370 1.634e-08 >> > >> > 85508 0.470 1.057e-07 >> > >> > 86666 0.580 7.862e-07 >> > >> > 04758 0.810 9.501e-07 >> > >> > 06169 0.440 1.104e-06 >> > >> > 63933 0.750 2.624e-06 >> > >> > 41838 0.960 8.119e-06 >> > >> > >> > data=ref >> > >> > reg1 reg2 >> > >> > 29220 63933 >> > >> > 26441 41838 >> > >> > 06169 10276 >> > >> > 74806 92643 >> > >> > 73732 82451 >> > >> > 86042 93502 >> > >> > 85508 95082 >> > >> > >> > >> > the results I need >> > >> > reg1 reg2 n >> > >> > 29220 63933 12 >> > >> > 26441 41838 78 >> > >> > 06169 10276 125 >> > >> > 74806 92643 11 >> > >> > 73732 82451 47 >> > >> > 86042 93502 98 >> > >> > 85508 95082 219 >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.