Hello all, To begin my analysis, I downloaded two TCGA datasets (GBM and LGG), both csv files, onto on r script after loading the cBioLite package. Following this, I inputted the following argument...
> the_data<-read.csv(file=“c:/file_name.csv,header=TRUE,sep=“,”) Upon running the line I received this... + If continue to press enter, the + sign continues to appear on every subsequent/new line. Does anyone know what this is indicative of and how I may continue on with my analysis My next step after this would have been the following (the numbers before each command being line markers; not part of line).. 1 library(TCGAbiolinks) 2 3 # Download the DNA methylation data: HumanMethylation450 LGG and GBM. 4 path <– "." Best wishes, Spencer Brackett On Sun, Aug 26, 2018 at 9:13 PM Caitlin <bioprogram...@gmail.com> wrote: > You're welcome Spencer :) > > I hope I was able to help you. If this problem persists, or a new one > appears, feel free to post or email. You might also like: > > https://www.biostars.org/ > > It is quite similar to StackOverflow but with a biological sciences focus. > > Hope this helps! > > ~Caitlin > > > > On Sun, Aug 26, 2018 at 6:02 PM Spencer Brackett < > spbracket...@saintjosephhs.com> wrote: > >> Caitlin, >> >> Thanks again! I already have the two files stored in those two CSV files >> via my desktop, but if tuning those with this function do not work, then I >> will try it with a flash drive. >> >> Best, >> >> Spencer Brackett >> >> On Sun, Aug 26, 2018 at 8:56 PM Caitlin <bioprogram...@gmail.com> wrote: >> >>> Hmm...could you store each in its own file (a flash drive would be fine) >>> then use: >>> >>> the_data <- read.csv(file="c:/file_name.csv", header=TRUE, sep=",") >>> >>> to read each into your script? The data would then exist as a dataframe >>> object that you could then work with. >>> >>> >>> On Sun, Aug 26, 2018 at 5:50 PM Spencer Brackett < >>> spbracket...@saintjosephhs.com> wrote: >>> >>>> Caitlin, >>>> >>>> Perhaps that is the problem. To be more specific, the data was >>>> transferred from the TCGA database to a CSV file... there are technically >>>> two separate files (CSV) for this analysis.... one for GBM and one for LGG. >>>> Both CVS files were then individually downloaded onto my open R console. >>>> Upon arranging them with the summary () function, the data expanded and >>>> took up the whole console page... even seemingly abrogating the arguments >>>> which allowed for the data to be downloaded onto R in the first place. Are >>>> you suggesting that I would need to utilize a flash drive to successfully >>>> utilize the function you suggested? Or could I perhaps do so with the CSV >>>> field I mentioned? If so, how? >>>> >>>> -Spencer B >>>> >>>> On Sun, Aug 26, 2018 at 8:42 PM Caitlin <bioprogram...@gmail.com> >>>> wrote: >>>> >>>>> No worries Spencer. There is no downloaded data? Nothing is physically >>>>> stored on your hard drive? The dot in the path would be interpreted (no >>>>> pun >>>>> intended!) as something like the following: >>>>> >>>>> If the TCGA data was stored in a file named "tcga_data.dat" and it was >>>>> in a directory named "C:\spencer", the 4th line of that script would set >>>>> the path to "C:\spencer\tcga_data.dat" if you ran the script from that >>>>> same >>>>> folder. If your tcga data is not stored in the same file from which the >>>>> script is being ran, it won't find any data to work with. Does this help? >>>>> >>>>> >>>>> On Sun, Aug 26, 2018 at 5:34 PM Spencer Brackett < >>>>> spbracket...@saintjosephhs.com> wrote: >>>>> >>>>>> Caitlin, >>>>>> >>>>>> Forgive me, but I’m not quite sure exactly what your question is >>>>>> asking. The data is originally from the TCGA and I have it downloaded >>>>>> onto >>>>>> another R script. I opened a new script to perform the functions I posted >>>>>> to this forum because I was unable to input any other commands into the >>>>>> console.... due to the fact that the translated data filled the entirety >>>>>> of >>>>>> said consule. Perhaps overloaded it? Regardless, I was unable to input >>>>>> any >>>>>> further commands. >>>>>> >>>>>> -Spencer Brackett >>>>>> >>>>>> >>>>>> On Sun, Aug 26, 2018 at 8:27 PM Caitlin <bioprogram...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> You're welcome Spencer :) >>>>>>> >>>>>>> The 4th line: >>>>>>> >>>>>>> path <– "." >>>>>>> >>>>>>> refers to the current directory (the dot in other words). Is the >>>>>>> data stored in the same directory where the code is being run? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 26, 2018 at 5:22 PM Spencer Brackett < >>>>>>> spbracket...@saintjosephhs.com> wrote: >>>>>>> >>>>>>>> Thank you! I will make note of that. Unfortunately, lines 1 and 4 >>>>>>>> of the first portion of this analysis appear to be where the error >>>>>>>> begins... to which several subsequent lines also come up as ‘errored’. >>>>>>>> Perhaps this is an issue of the capitalization and/or spacing >>>>>>>> (something >>>>>>>> within the text)? The proposed method for methylation data extraction >>>>>>>> is >>>>>>>> based on the first third of the following TCGA workflow: >>>>>>>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5302158/#!po=0.0715308 >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Spencer Brackett >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Aug 26, 2018 at 8:07 PM Caitlin <bioprogram...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Spencer. >>>>>>>>> >>>>>>>>> Should you capitalize the following library import? >>>>>>>>> >>>>>>>>> library(summarizedExperiment) >>>>>>>>> >>>>>>>>> In other words, I think that line should be: >>>>>>>>> >>>>>>>>> library(SummarizedExperiment) >>>>>>>>> >>>>>>>>> Hope this helps. >>>>>>>>> >>>>>>>>> ~Caitlin >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 26, 2018 at 2:09 PM Spencer Brackett < >>>>>>>>> spbracket...@saintjosephhs.com> wrote: >>>>>>>>> >>>>>>>>>> Good evening, >>>>>>>>>> >>>>>>>>>> I am attempting to run the following analysis on TCGA data, >>>>>>>>>> however >>>>>>>>>> something is being reported as an error in my arguments... any >>>>>>>>>> ideas as to >>>>>>>>>> what is incorrect in the following? Thanks! >>>>>>>>>> >>>>>>>>>> 1 library(TCGAbiolinks) >>>>>>>>>> 2 >>>>>>>>>> 3 # Download the DNA methylation data: HumanMethylation450 LGG >>>>>>>>>> and GBM. >>>>>>>>>> 4 path <– "." >>>>>>>>>> 5 >>>>>>>>>> 6 query.met <– TCGAquery(tumor = >>>>>>>>>> c("LGG","GBM"),"HumanMethylation450", >>>>>>>>>> level = 3) >>>>>>>>>> 7 TCGAdownload(query.met, path = path ) >>>>>>>>>> 8 met <– TCGAprepare(query = query.met,dir = path, >>>>>>>>>> 9 add.subtype = TRUE, add.clinical = TRUE, >>>>>>>>>> 10 summarizedExperiment = TRUE, >>>>>>>>>> 11 save = TRUE, filename = "lgg_gbm_met.rda") >>>>>>>>>> 12 >>>>>>>>>> 13 # Download the expression data: IlluminaHiSeq_RNASeqV2 LGG and >>>>>>>>>> GBM. >>>>>>>>>> 14 query.exp <– TCGAquery(tumor = c("lgg","gbm"), platform = >>>>>>>>>> "IlluminaHiSeq_ >>>>>>>>>> RNASeqV2",level = 3) >>>>>>>>>> 15 >>>>>>>>>> 16 TCGAdownload(query.exp,path = path, type = >>>>>>>>>> "rsem.genes.normalized_ >>>>>>>>>> results") >>>>>>>>>> 17 >>>>>>>>>> 18 exp <– TCGAprepare(query = query.exp, dir = path, >>>>>>>>>> 19 summarizedExperiment = TRUE, >>>>>>>>>> 20 add.subtype = TRUE, add.clinical = TRUE, >>>>>>>>>> 21 type = "rsem.genes.normalized_results", >>>>>>>>>> 22 save = T,filename = "lgg_gbm_exp.rda") >>>>>>>>>> >>>>>>>>>> To download data on DNA methylation and gene expression… >>>>>>>>>> >>>>>>>>>> 1 library(summarizedExperiment) >>>>>>>>>> 2 # get expression matrix >>>>>>>>>> 3 data <– assay(exp) >>>>>>>>>> 4 >>>>>>>>>> 5 # get sample information >>>>>>>>>> 6 sample.info <– colData(exp) >>>>>>>>>> 7 >>>>>>>>>> 8 # get genes information >>>>>>>>>> 9 genes.info <– rowRanges(exp) >>>>>>>>>> >>>>>>>>>> Following stepwise procedure for obtaining GBM and LGG clinical >>>>>>>>>> data… >>>>>>>>>> >>>>>>>>>> 1 # get clinical patient data for GBM samples >>>>>>>>>> 2 gbm_clin <– TCGAquery_clinic("gbm","clinical_patient") >>>>>>>>>> 3 >>>>>>>>>> 4 # get clinical patient data for LGG samples >>>>>>>>>> 5 lgg_clin <– TCGAquery_clinic("lgg","clinical_patient") >>>>>>>>>> 6 >>>>>>>>>> 7 # Bind the results, as the columns might not be the same, >>>>>>>>>> 8 # we will plyr rbind.fill , to have all columns from both files >>>>>>>>>> 9 clinical <– plyr::rbind.fill(gbm_clin ,lgg_clin) >>>>>>>>>> 10 >>>>>>>>>> 11 # Other clinical files can be downloaded, >>>>>>>>>> 12 # Use ?TCGAquery_clinic for more information >>>>>>>>>> 13 clin_radiation <– TCGAquery_clinic("lgg","clinical_radiation") >>>>>>>>>> 14 >>>>>>>>>> 15 # Also, you can get clinical information from different tumor >>>>>>>>>> types. >>>>>>>>>> 16 # For example sample 1 is GBM, sample 2 and 3 are TGCT >>>>>>>>>> 17 data <– TCGAquery_clinic(clinical_data_type = >>>>>>>>>> "clinical_patient", >>>>>>>>>> 18 samples = c("TCGA-06-5416-01A-01D-1481-05", >>>>>>>>>> 19 "TCGA-2G-AAEW-01A-11D-A42Z-05", >>>>>>>>>> 20 "TCGA-2G-AAEX-01A-11D-A42Z-05")) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # Searching idat file for DNA methylation >>>>>>>>>> query <- GDCquery(project = "TCGA-GBM", >>>>>>>>>> data.category = "Raw microarray data", >>>>>>>>>> data.type = "Raw intensities", >>>>>>>>>> experimental.strategy = "Methylation array", >>>>>>>>>> legacy = TRUE, >>>>>>>>>> file.type = ".idat", >>>>>>>>>> platform = "Illumina Human Methylation 450") >>>>>>>>>> >>>>>>>>>> **Repeat for LGG** >>>>>>>>>> >>>>>>>>>> To access mutational information concerning TMZ methylation… >>>>>>>>>> >>>>>>>>>> > mutation <– TCGAquery_maf(tumor = "lgg") >>>>>>>>>> 2 Getting maf tables >>>>>>>>>> 3 Source: https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files >>>>>>>>>> 4 We found these maf files below: >>>>>>>>>> 5 MAF.File.Name >>>>>>>>>> 6 2 hgsc.bcm.edu_LGG.IlluminaGA_DNASeq.1.somatic.maf >>>>>>>>>> 7 >>>>>>>>>> 8 3 >>>>>>>>>> LGG_FINAL_ANALYSIS.aggregated.capture.tcga.uuid.curated.somatic.maf >>>>>>>>>> 9 >>>>>>>>>> 10 Archive.Name Deploy.Date >>>>>>>>>> 11 2 hgsc.bcm.edu_LGG.IlluminaGA_DNASeq_automated.Level_2.1.0.0 >>>>>>>>>> 10-DEC-13 >>>>>>>>>> 12 3 broad.mit.edu_LGG.IlluminaGA_DNASeq_curated.Level_2.1.3.0 >>>>>>>>>> 24-DEC-14 >>>>>>>>>> 13 >>>>>>>>>> 14 Please, select the line that you want to download: 3 >>>>>>>>>> >>>>>>>>>> **Repeat this for GBM*** >>>>>>>>>> >>>>>>>>>> Selecting specified lines to download… >>>>>>>>>> >>>>>>>>>> 1 gbm.subtypes <− TCGAquery_subtype(tumor = "gbm") >>>>>>>>>> 2 lgg.subtypes <− TCGAquery_subtype(tumor = "lgg”) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Downloading data via the Bioconductor package RTCGAtoolbox… >>>>>>>>>> >>>>>>>>>> library(RTCGAToolbox) >>>>>>>>>> 2 >>>>>>>>>> 3 # Get the last run dates >>>>>>>>>> 4 lastRunDate <− getFirehoseRunningDates()[1] >>>>>>>>>> 5 lastAnalyseDate <− getFirehoseAnalyzeDates(1) >>>>>>>>>> 6 >>>>>>>>>> 7 # get DNA methylation data, RNAseq2 and clinical data for LGG >>>>>>>>>> 8 lgg.data <− getFirehoseData(dataset = "LGG", >>>>>>>>>> 9 gistic2_Date = getFirehoseAnalyzeDates(1), runDate = >>>>>>>>>> lastRunDate, >>>>>>>>>> 10 Methylation = TRUE, RNAseq2_Gene_Norm = TRUE, Clinic = >>>>>>>>>> TRUE, >>>>>>>>>> 11 Mutation = T, >>>>>>>>>> 12 fileSizeLimit = 10000) >>>>>>>>>> 13 >>>>>>>>>> 14 # get DNA methylation data, RNAseq2 and clinical data for GBM >>>>>>>>>> 15 gbm.data <− getFirehoseData(dataset = "GBM", >>>>>>>>>> 16 runDate = lastDate, gistic2_Date = >>>>>>>>>> getFirehoseAnalyzeDates(1), >>>>>>>>>> 17 Methylation = TRUE, Clinic = TRUE, RNAseq2_Gene_Norm = >>>>>>>>>> TRUE, >>>>>>>>>> 18 fileSizeLimit = 10000) >>>>>>>>>> 19 >>>>>>>>>> 20 # To access the data you should use the getData function >>>>>>>>>> 21 # or simply access with @ (for example gbm.data@Clinical) >>>>>>>>>> 22 gbm.mut <− getData(gbm.data,"Mutations") >>>>>>>>>> 23 gbm.clin <− getData(gbm.data,"Clinical") >>>>>>>>>> 24 gbm.gistic <− getData(gbm.data,"GISTIC") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Genomic Analysis/Final data extraction: >>>>>>>>>> >>>>>>>>>> Enable “getData” to access the data >>>>>>>>>> >>>>>>>>>> Obtaining GISTIC results… >>>>>>>>>> >>>>>>>>>> 1 # Download GISTIC results >>>>>>>>>> 2 gistic <− getFirehoseData("GBM",gistic2_Date ="20141017" ) >>>>>>>>>> 3 >>>>>>>>>> 4 # get GISTIC results >>>>>>>>>> 5 gistic.allbygene <− gistic@GISTIC@AllByGene >>>>>>>>>> 6 gistic.thresholedbygene <− gistic@GISTIC@ThresholedByGene >>>>>>>>>> >>>>>>>>>> Repeat this procedure to obtain LGG GISTIC results. >>>>>>>>>> >>>>>>>>>> ***Please ignore the 'non-coded' text as they are procedural >>>>>>>>>> steps/classifications*** >>>>>>>>>> >>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>> >>>>>>>>>> ______________________________________________ >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>>> PLEASE do read the posting guide >>>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>>>>> >>>>>>>>> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.