Do any of your Entrez.Gene values appear in more than one row in either of the dataframes?
For example: > dfa <- data.frame( a = c('A','A','B','C') , x= 1:4) > dfb <- data.frame( a = c('A','B','B','D'), y=1:4) > > merge(dfa,dfb) a x y 1 A 1 1 2 A 2 1 3 B 3 2 4 B 3 3 > length(intersect(dfa$a, dfb$a)) [1] 2 -Don At 8:40 PM -0400 3/17/08, Mark W Kimpel wrote: >I have used merge regularly and thought I understood how it worked, but >I must not. I have two dataframes with identical colnames from two >different experiments, TL01 and LC01. Each dataframe has a column named >"Entrez.Gene", which I have converted to "as.character" just to make >sure merge is not looking at factor levels. Because I have done some >filtering, the Entrez.Gene values in each experiment overlap but are not >identical. I want to produce a summary report with only those >identifiers found in each experiment. I could do this with intersect and >matching, but I thought merge could easily do this. > >Below is my code and sessionInfo. For some reason there are over twice >as many rows as I would expect. I can't quite figure out which arguments >I have screwed up. What am I missing? It has to be something simple, I'm >just not seeing it. Thanks, Mark > > > TL01.LC01.data <- merge(TL01.data, LC01.data, by = "Entrez.Gene", >all.x = FALSE, all.y = FALSE, suffixes = c(".TL01",".LC01")) > > length(intersect(TL01.data$Entrez.Gene, LC01.data$Entrez.Gene)) >[1] 13401 > > dim(TL01.LC01.data) >[1] 29471 57 > > dim(TL01.data) >[1] 16479 29 > > dim(LC01.data) >[1] 16479 29 >-- > > sessionInfo() >R version 2.7.0 Under development (unstable) (2008-03-05 r44683) >x86_64-unknown-linux-gnu > >locale: >LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > >attached base packages: >[1] splines tools stats graphics grDevices datasets utils >[8] methods base > >other attached packages: > [1] affycoretools_1.11.4 annaffy_1.11.5 KEGG.db_2.1.3 > [4] gcrma_2.11.4 matchprobes_1.11.1 biomaRt_1.13.9 > [7] RCurl_0.8-3 GOstats_2.5.2 Category_2.5.7 >[10] genefilter_1.17.12 survival_2.34 RBGL_1.15.7 >[13] annotate_1.17.11 xtable_1.5-2 GO.db_2.1.3 >[16] AnnotationDbi_1.1.26 RSQLite_0.6-8 DBI_0.2-4 >[19] graph_1.17.17 limma_2.13.6 affy_1.17.9 >[22] preprocessCore_1.1.5 affyio_1.7.15 Biobase_1.99.2 > >loaded via a namespace (and not attached): >[1] cluster_1.11.10 XML_1.93-2 > >Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry >Indiana University School of Medicine > >15032 Hunter Court, Westfield, IN 46074 > >(317) 490-5129 Work, & Mobile & VoiceMail >(317) 204-4202 Home (no voice mail please) > >mwkimpel<at>gmail<dot>com > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.