Hi Chris- You can more easily using sample sheet via dba() or as you suggest add in them the one at a time using dba.peakset.
Each unique peakset (line in the samplesheet/call to dba.peakset) is a unique combination of metadata (including the tissue, factor, condition, treatment, and replicate), peakset (intervals + scores), aligned reads (per ChIP), and an Input. So what you are calling unique biological samples would each have two entries. If your peak data are in separate files for each ChIP, the calls to dba.peakset would look like: DBA = dba.peakset(DBA,sampID="samp1_O-Glc",peaks="samp1HOMERpeaks.txt",factor="O-GlcNAc" condition="male",treatment="nostress",replicate=1, bamReads="samp1_O-GLcNAc.bam", controlReads="samp1_Input.bam") DBA = dba.peakset(DBA,sampID="samp1_H3K4me3",peaks="samp1K$windows.txt",factor="H3K4me3" condition="male",treatment="nostress",replicate=1, bamReads="samp1_O-H3K4me3.bam", controlReads="samp1_Input.bam") In this case, the "samp1HOMERpeaks.txt" file contains the O-GlcNAc peaks called by HOMER for this sample, in a four-column format (chromosome, start, end, peakscore, and the "samp1K$windows.txt" file contains the H3K4me3 scores for the sample. Again, all this is easier to keep track of in a sample sheet. If you've already combined all the peak scores into one big dataframe with columns for each library, you can pass in the the ones you want using the"peaks" parameter. Instead of setting peaks to a file containing the peaks for the sample, pass in the dataframe with the first three columns set to chromosome, start, and end, and the fourth column containing the score for that specific library. You are probably better off not doing this, and loading all the peak files separately, as DiffBind will create the combined table for you; by supplying each individual peak file you can more easily look at how the peaksets overlap. Cheers- Rory On 08/04/2013 20:32, "Christopher Howerton" <chowe...@vet.upenn.edu<mailto:chowe...@vet.upenn.edu>> wrote: Hi Rory, First, thank you for putting together this R package for those of us whose skills lie elsewhere! I have worked through your vignette, and it appears that your package will do exactly what I require. My problem can be framed as having difficulties loading data into the appropriate format. Let me give you a few specifics: Experimental design: a 2(sex) X 2(stress/nostress) ChIP marks: H3K4me3 & O-GlcNAc (a PTM of interest to our lab) libraries/biological sample: H3K4me3, O-GlcNAc and Input Data I have available to me: Our core has aligned the reads, and done a peak calling for me; Homer (I believe) for the H3K4me3, and a custom bin based approach (5kb sized bins) for the O-Glcnac. I also have all the upstream files available to me as well. The format for the data are: colnames = chromosome, start, end, libraries; so there is an entry per genomic loci per library even though the number might be zero So, my question is, what is the appropriate way to read this in/analyze? Reading through the package documentation, I may want to make separate R dataframes/library, and then read them into one dba object using the dba.peakset function with the peak.format = "raw" if this is the case, how do I specify that there are 3 libraries from the same biological replicate (i.e. H3K4me3, O-glcnac and input)? I'm sure this is pretty straight forward, so I apologize for bugging you, but just wanted to make sure I started on the right foot. Best, Chris -- Christopher Howerton, PhD Postdoctoral Researcher (215) 898-1368 University of Pennsylvania 201E Vet 3800 Spruce Street Philadelphia, PA 19104-6046 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel