Hello, First of all, it's better to post data using ?dput. Below, I give an example of that in the lines structure(...). dat <- structure(list(rs = c(" rs941873 ", " rs634552 ", " rs11107175 ", " rs12307687 ", " rs3917155 ", " rs1600640 ", " rs2871865 ", " rs2955250 ", " rs228758 ", " rs224333 ", " rs4681725 ", " rs7652177 ", " rs925098 ", " rs1662837 ", " rs10071837 " ), n0 = c(81139462, 75282052, 94161719, 47175866, 76444685, 84603034, 99194896, 61959740, 42148205, 34023962, 56692321, 171969077, 17919811, 82168889, 33381581), Pvalue = c(1.52e-07, 0.108, 0.0285, 0.123, 0.68, 0.000275, 0.0709, 0.0317, 0.0772, 0.021, 0.000445, 0.000634, 5.55e-09, 8.66e-05, 0.000574), V1 = c("rs941873", "rs941873", "rs941873", "rs12307687", "rs941873", "rs12307687", "rs12307687", "rs12307687", "rs12307687", "rs10071837", "rs10071837", "rs10071837", "rs925098", "rs925098", "rs925098")), .Names = c("rs", "n0", "Pvalue", "V1"), row.names = c(NA, -15L), class = "data.frame")
Now, if I understand correctly, the following might do what you want. tmp <- split(dat[, "Pvalue"], dat[, "V1"]) idx <- unlist(lapply(tmp, function(x) x == min(x)))[order(order(dat[, "V1"]))] rm(tmp) result <- dat[idx, ] result Hope this helps, Rui Barradas Citando oslo via R-help <r-help@r-project.org>: > Hi all; > I have a big data set (a small part is given below) and V1 column > has repeated info in it. That is rs941873, rs12307687... are > repeating many times. I need choose only one SNP (in first column > named rs) which has the smallest Pvalue withing V1 column. That is > I need choose only one SNP for repeated names in V1 which has the > smallest Pvalue. > Your helps are truly appreciated,Oslo > > | rs | n0 | Pvalue | V1 | > | rs941873 | 81139462 | 1.52E-07 | rs941873 | > | rs634552 | 75282052 | 1.08E-01 | rs941873 | > | rs11107175 | 94161719 | 2.85E-02 | rs941873 | > | rs12307687 | 47175866 | 1.23E-01 | rs12307687 | > | rs3917155 | 76444685 | 6.80E-01 | rs941873 | > | rs1600640 | 84603034 | 2.75E-04 | rs12307687 | > | rs2871865 | 99194896 | 7.09E-02 | rs12307687 | > | rs2955250 | 61959740 | 3.17E-02 | rs12307687 | > | rs228758 | 42148205 | 7.72E-02 | rs12307687 | > | rs224333 | 34023962 | 2.10E-02 | rs10071837 | > | rs4681725 | 56692321 | 4.45E-04 | rs10071837 | > | rs7652177 | 171969077 | 6.34E-04 | rs10071837 | > | rs925098 | 17919811 | 5.55E-09 | rs925098 | > | rs1662837 | 82168889 | 8.66E-05 | rs925098 | > | rs10071837 | 33381581 | 5.74E-04 | rs925098 | > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.