Any TCGA MAFs released to the public were considered deidentified. That wouldn't be the part i would worry about. It's a nice idea, and a data package or packages seems like the idiomatic way to do it, as you noted. Personally I think it would indeed benefit a lot of people (vs, say, GDC). Maftools is a super handy package for visualization.
Like Kasper, I am not speaking for bioc-core, just as a TCGA author who spent a lot of time discussing releases with our DCC back-in-the-day. --t > On Oct 1, 2017, at 9:29 PM, Kasper Daniel Hansen > <kasperdanielhan...@gmail.com> wrote: > > I cannot speak for the core team. > > You should separate the data from the software methods and provide a data > package containing the MAFs. This has the additional advantage of > separating versionning of the mutation data from your software. As a data > package this does not sound extensive; the largest dataset is 3.7Mb. There > is a potential privacy problem with sharing mutations, but I don't know at > what level the mutations are described. I assume you have considered this? > > Best, > Kasper > >> On Sun, Oct 1, 2017 at 9:16 PM, Anand MT <anand...@hotmail.com> wrote: >> >> Hi all, >> >> I maintain maftools package which offers multitude of functions to perform >> various analyses and visualization of MAF (Mutation Annotation Format) >> files from cancer cohorts. >> >> In the upcoming bioconductor release, I plan to include all MAFs from 32 >> TCGA cohorts as a part of the package. These tcga mafs will be stored as >> MAF objects containing curated somatic mutations along with clinical >> information in the extdata directory and can be loaded via a “tcga_load” >> function. >> >> I think this will help many researchers working with tcga mutation data >> and saves the time and hassle of going through various databases to search >> and assemble. I believe this also helps in reproducible research. >> >> However, size of these MAF objects vary according to the cohorts size and >> mutation burden; with LAML (leukemia) being the smallest (91 kb) and LUAD >> (Lung Adeno Carcinoma) being the largest (3.7 mb). Also including these >> MAFs increases package size to 46 mb (from 7mb without theses datasets). >> >> My question is, >> >> * is it okay for a package to be of this size ? >> * I haven't tried to push these commits to repository yet, but in case >> git rejects my push due to size limit, is it possible to make an exception, >> given the scenario ? >> >> If this can't be done in any ways or if it breaks any rules of package >> guidelines, I don't mind dropping the idea either. >> >> Thanks. >> >> -Anand. >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel