With the format you have, we have to split out the genes separated by commas and then do 'table'. Here is one way of doing it:
> x <- readLines(textConnection(" Function > x + Function1 gene5, gene19, gene22, gene23 + Function2 gene1, gene7, gene19 + Function3 gene2, gene3, gene7, gene23")) > closeAllConnections() > # funny data; split it up. get rid of header > x <- x[-1] > # split on blanks > x.b <- strsplit(x, "[[:blank:]]+") > # recombine into a 'long' format > x.c <- lapply(x.b, function(z) cbind(z[1], unlist(strsplit(z[-1], ",")))) > x.c <- do.call(rbind, x.c) > table(list(x.c[,1], x.c[,2])) .2 .1 gene1 gene19 gene2 gene22 gene23 gene3 gene5 gene7 Function1 0 1 0 1 1 0 1 0 Function2 1 1 0 0 0 0 0 1 Function3 0 0 1 0 1 1 0 1 > On 2/20/08, Paul Christoph Schröder <[EMAIL PROTECTED]> wrote: > I'm sorry if I didn't wrote it the right way. I'm just starting in the world > of R and it's not that easy at the beginning. > I wrote it again with code and comments. I hope it is understandable now. Do > you think I should post it again in this shape? > > func_gen<-read.delim(file, header=T) #contains functions (rows) and genes > (colum); func_gen is a data.frame > > #It looks like this: > # Function x > # Function1 gene5, gene19, gene22, gene23 > # Function2 gene1, gene7, gene19 > # Function3 gene2, gene3, gene7, gene23 > > # Duplicates of genes exist between different functions. This is why the > "read.delim" command was used instead of the "read.table" command #because > of "duplicate 'row.names' are not allowed" error. > > all_genes #contains all genes from above data frame; all_genes is a > data.frame > #It looks like this: > # Genes > # gene1 > # gene2 > # gene3 > # gene5 > # gene7 > # gene19 > # gene 22 > # gene 23 > > func_gen[,2] %in% all_genes #this should result in a true-false matrix > # Like this: > # Function gene1 gene2 gene3 gene5 gene7 gene19 gene22 > gene23 > # Function1 F F F T F > T T T > # Function2 T F F F T > T F F > # Function3 F T T F T > F F T > > #and instead I obtain a true-false matrix with only FALSE-values. > > Thanks in advance! > Paul > > > -- Paul C. Schröder PhD-Student Division of Proteomics, Genomics & > Bioinformatics Center for Applied Medicine (CIMA) University of > Navarra Avda. Pio XII, 55 E-31008 Pamplona, Spain Tel: +34 948 194700, ext > 5023 email: [EMAIL PROTECTED] > > > jim holtman escribió: > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code. It is hard to give a > solution if we don't have the problem statement, or an example of the data > structures you are using. On Feb 20, 2008 6:57 AM, Paul Christoph > Schröder <[EMAIL PROTECTED]> wrote: > Hello all! I have the following problem with the %in% command: 1) I have a > data frame that consists of functions (rows) and genes (columns). The whole > has been loaded with the "read.delim" command because of gene-duplications > between the different rows. 2) Now, there is another data frame that > contains all the genes (only the genes and without duplicates) from all the > functions of the above data frame. What I want to do now is to use the "% > in %" command to obtain a TRUE-FALSE data frame. This should be a data > frame, where for every function some genes are TRUE and some are FALSE > depending if they were or not in the specific function when matched against > the "all genes" data frame. The main problem I have is the way how the > genes are in the first data frame. I used the "unlist" command to separate > them through commas ",". But every time I do the match between the first and > second data frame it returns out FALSE for every gene in every > function. Can anyone please give me a hind how to handle the problem? Thank > you very much in advance! Paul -- Paul C. Schröder PhD-Student Division of > Proteomics, Genomics & Bioinformatics Center for Applied Medicine > (CIMA) University of Navarra Avda. Pio XII, 55 E-31008 Pamplona, Spain Tel: > +34 948 194700, ext 5023 email: [EMAIL PROTECTED] [[alternative > HTML version > deleted]] ______________________________________________ R-help@r-project.org > mailing > list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do > read the posting guide > http://www.R-project.org/posting-guide.html and provide > commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.