Hi Chris-

You can more easily using sample sheet via dba() or as you suggest add in them 
the one at a time using dba.peakset.

Each unique peakset (line in the samplesheet/call to dba.peakset) is a unique 
combination of metadata (including the tissue, factor, condition, treatment, 
and replicate), peakset (intervals + scores), aligned reads (per ChIP), and an 
Input. So what you are calling unique biological samples would each have two 
entries. If your peak data are in separate files for each ChIP, the calls to 
dba.peakset would look like:

DBA = 
dba.peakset(DBA,sampID="samp1_O-Glc",peaks="samp1HOMERpeaks.txt",factor="O-GlcNAc"
                  condition="male",treatment="nostress",replicate=1,
                  bamReads="samp1_O-GLcNAc.bam",
                  controlReads="samp1_Input.bam")

DBA = 
dba.peakset(DBA,sampID="samp1_H3K4me3",peaks="samp1K$windows.txt",factor="H3K4me3"
                  condition="male",treatment="nostress",replicate=1,
                  bamReads="samp1_O-H3K4me3.bam",
                  controlReads="samp1_Input.bam")

In this case, the "samp1HOMERpeaks.txt" file contains the O-GlcNAc peaks called 
by HOMER for this sample, in a four-column format (chromosome, start, end, 
peakscore, and the "samp1K$windows.txt" file contains the H3K4me3 scores for 
the sample.

Again,  all this is easier to keep track of in a sample sheet.

If you've already combined all the peak scores into one big dataframe with 
columns for each library, you can pass in the the ones you want using 
the"peaks" parameter. Instead of setting peaks to a file containing the peaks 
for the sample, pass in the dataframe  with the first three columns set to 
chromosome, start, and end, and the fourth column containing the score for that 
specific library.  You are probably better off not doing this, and loading all 
the peak files separately, as DiffBind will create the combined table for you; 
by supplying each individual peak file you can more easily look at how the 
peaksets overlap.

Cheers-
Rory

On 08/04/2013 20:32, "Christopher Howerton" 
<chowe...@vet.upenn.edu<mailto:chowe...@vet.upenn.edu>> wrote:

Hi Rory,

First, thank you for putting together this R package for those of us whose 
skills lie elsewhere! I have worked through your vignette, and it appears that 
your package will do exactly what I require. My problem can be framed as having 
difficulties loading data into the appropriate format. Let me give you a few 
specifics:

Experimental design: a 2(sex) X 2(stress/nostress)
ChIP marks: H3K4me3 & O-GlcNAc (a PTM of interest to our lab)
libraries/biological sample: H3K4me3, O-GlcNAc and Input

Data I have available to me: Our core has aligned the reads, and done a peak 
calling for me; Homer (I believe) for the H3K4me3, and a custom bin based 
approach (5kb sized bins) for the O-Glcnac. I also have all the upstream files 
available to me as well.

The format for the data are: colnames = chromosome, start, end, libraries; so 
there is an entry per genomic loci per library even though the number might be 
zero

So, my question is, what is the appropriate way to read this in/analyze? 
Reading through the package documentation, I may want to make separate R 
dataframes/library, and then read them into one dba object using the 
dba.peakset function with the peak.format = "raw" if this is the case, how do I 
specify that there are 3 libraries from the same biological replicate (i.e. 
H3K4me3, O-glcnac and input)?

I'm sure this is pretty straight forward, so I apologize for bugging you, but 
just wanted to make sure I started on the right foot.

Best,

Chris

--
Christopher Howerton, PhD
Postdoctoral Researcher
(215) 898-1368
University of Pennsylvania
201E Vet
3800 Spruce Street
Philadelphia, PA 19104-6046



        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to