Is the file being saved as .xls, .xlsx, .csv, .tsv, or .txt?
On Wed, Dec 26, 2018 at 10:14 PM Spencer Brackett < spbracket...@saintjosephhs.com> wrote: > Follow up, > > Would read.txt also work, as I am certain that I have both datasets in > .txt files? As to a previous users question concern the .csv nature of the > supposed excel file, I am uncertain as to how this was translated as such. > The file is most certainly in excel. > > > On Thu, Dec 27, 2018 at 12:10 AM Spencer Brackett < > spbracket...@saintjosephhs.com> wrote: > >> Caitlin, >> >> I tried your command in both RGui and RStudio but both came up as >> errors. I believe I made a mistake somewhere I labeling/downloading the >> files, which is the source of the confusion in R. I will re-examine the >> files saved on my desktop to determine the error. Regardless, would it be >> better to use a read.table or read.csv function when attempting to download >> my datasets? I tried using read.xl on RStudio as this process seemed much >> easier, however, it would seem that my proclivity to error prevents such. >> >> Best, >> >> Spencer >> >> On Wed, Dec 26, 2018 at 11:55 PM Caitlin Gibbons <bioprogram...@gmail.com> >> wrote: >> >>> Does this help Spencer? The read.delim() function assumes a tab >>> character by default, but I specifically included it using the read.csv >>> function. The downloaded file is NOT an Excel file so this should help. >>> >>> GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM >>> protein_expression.tsv", sep=â\tâ) >>> >>> Sent from my iPhone >>> >>> > On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <r...@temple.edu> >>> wrote: >>> > >>> > this is wrong because the file is a csv file. read_excel is designed >>> > for xls files. >>> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM >>> > protein_expression.csv") >>> > >>> > How did you get a csv? it downloads as tsv. >>> > >>> > the statement you should use is in base, no library() statement is >>> needed. >>> > >>> > GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM >>> > protein_expression.csv") >>> > >>> > read.delim is the same as read.csv except that it sets the sep >>> > argument to "\t". >>> > >>> > >>> > >>> > On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett >>> > <spbracket...@saintjosephhs.com> wrote: >>> >> >>> >> Sorry, my mistake. >>> >> >>> >> So I could still use read.table and should I try using a .txt version >>> of >>> >> the file to avoid the silent changes you described? >>> >> >>> >> Also, when I tried to simply this process by downloading the dataset >>> onto >>> >> RStudio opposed to R (Gui) I received the following... >>> >> library(readxl) >>> >>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM >>> >> protein_expression.csv") >>> >> Error: Can't establish that the input is either xls or xlsx. >>> >>> View(GBM_protein_expression) >>> >> Error in View : object 'GBM_protein_expression' not found >>> >> Error in gzfile(file, mode) : cannot open the connection >>> >> In addition: Warning message: >>> >> In gzfile(file, mode) : >>> >> cannot open compressed file >>> >> >>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds', >>> >> probable reason 'No such file or directory' >>> >>> library(readxl) >>> >>> GBM_protein_expression <- >>> >> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx") >>> >> readxl works best with a newer version of the tibble package. >>> >> You currently have tibble v1.4.2. >>> >> Falling back to column name repair from tibble <= v1.4.2. >>> >> Message displays once per session. >>> >>> View(GBM_protein_expression) >>> >> >>> >> >>> >> Is this perhaps the result of lack of preview (which I did not >>> complete at >>> >> the time I hit import as the preview failed to load), or the fact >>> that the >>> >> excel file itself contains no numerical data, but only TRUE or FALSE >>> >> entries? >>> >> >>> >> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller < >>> jdnew...@dcn.davis.ca.us> >>> >> wrote: >>> >> >>> >>> Please always reply-all to keep the list involved. >>> >>> >>> >>> If you used Save As to change the data format to Excel AND the file >>> >>> extension to xlsx, then yes, you should be able to read with readxl. >>> I >>> >>> don't recommend it, though... Excel often changes data silently and >>> in >>> >>> irregularly located places in your file. >>> >>> >>> >>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett < >>> >>> spbracket...@saintjosephhs.com> wrote: >>> >>>> So even if I imported the file form ICGC to my desktop as an excel >>> >>>> file, >>> >>>> and can view and saved the data as such, it is still a TSV? >>> >>>> >>> >>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller >>> >>>> <jdnew...@dcn.davis.ca.us> >>> >>>> wrote: >>> >>>> >>> >>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them, >>> >>>> but >>> >>>>> that does not make them Excel files. >>> >>>>> >>> >>>>> Read a TSV file with read.table or read.csv, setting the sep >>> argument >>> >>>> to >>> >>>>> "\t". >>> >>>>> >>> >>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett < >>> >>>>> spbracket...@saintjosephhs.com> wrote: >>> >>>>>> I tried importing the file without preview and recieved the >>> >>>>>> following.... >>> >>>>>> >>> >>>>>> library(readxl) >>> >>>>>>> GBM_protein_expression <- >>> read_excel("C:/Users/Spencer/Desktop/GBM >>> >>>>>> protein_expression.csv") >>> >>>>>> Error: Can't establish that the input is either xls or xlsx. >>> >>>>>>> View(GBM_protein_expression) >>> >>>>>> Error in View : object 'GBM_protein_expression' not found >>> >>>>>> Error in gzfile(file, mode) : cannot open the connection >>> >>>>>> In addition: Warning message: >>> >>>>>> In gzfile(file, mode) : >>> >>>>>> cannot open compressed file >>> >>>>> >>> >>>>> >>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds', >>> >>>>>> probable reason 'No such file or directory' >>> >>>>>>> library(readxl) >>> >>>>>>> GBM_protein_expression <- >>> >>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ >>> expression.xlsx") >>> >>>>>> readxl works best with a newer version of the tibble package. >>> >>>>>> You currently have tibble v1.4.2. >>> >>>>>> Falling back to column name repair from tibble <= v1.4.2. >>> >>>>>> Message displays once per session. >>> >>>>>>> View(GBM_protein_expression) >>> >>>>>> >>> >>>>>> Also, the area above my console says that no data is available in >>> >>>> the >>> >>>>>> table. Is this perhaps the result of lack of preview or the fact >>> >>>> that >>> >>>>>> the >>> >>>>>> excel file itself contains no numerical data, but only TRUE or >>> FALSE >>> >>>>>> entries? >>> >>>>>> >>> >>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett < >>> >>>>>> spbracket...@saintjosephhs.com> wrote: >>> >>>>>> >>> >>>>>>> Hello again, >>> >>>>>>> >>> >>>>>>> I worked on directly downloading the file into R as was >>> suggested, >>> >>>>>> but >>> >>>>>>> have thus far been unsuccessful. This is what I generated on my >>> >>>>>> second >>> >>>>>>> attempt... >>> >>>>>>> >>> >>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t") >>> >>>>>>> Error: unexpected symbol in "GBM protein_expression" >>> >>>>>>>> GBM >>> >>>>>>> >>> >>>>> >>> >>> >>> >>>>> >>> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE, >>> >>>>>>> sep="\t") >>> >>>>>>> Error: unexpected symbol in "GBM protein_expression" >>> >>>>>>>> >>> >>>>>>> >>> >>>>>>> What part of the argument is in error? >>> >>>>>>> >>> >>>>>>> Also I tried importing the dataset as an excel file on RStudio to >>> >>>> see >>> >>>>>> if I >>> >>>>>>> could solve my problem that way. However, my imported excel file >>> >>>> has >>> >>>>>> been >>> >>>>>>> stuck in the 'retrieving preview data' and no data is appearing. >>> >>>> Is >>> >>>>>> the >>> >>>>>>> data file prehaps too large or in the wrong format? >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett < >>> >>>>>>> spbracket...@saintjosephhs.com> wrote: >>> >>>>>>> >>> >>>>>>>> Mr. Heiberger, >>> >>>>>>>> >>> >>>>>>>> Thank you for the insight! I will try out suggestion. >>> >>>>>>>> >>> >>>>>>>> Best, >>> >>>>>>>> >>> >>>>>>>> Spencer Brackett >>> >>>>>>>> >>> >>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger >>> >>>>>> <r...@temple.edu> >>> >>>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>>> I looked at the first file. It gives an option to download as >>> >>>> TSV >>> >>>>>>>>> (tab separated values). >>> >>>>>>>>> That is the same as CSV except with tabs instead of commas. >>> >>>>>>>>> You do not need any external software to read it. Read the >>> >>>>>> downloaded >>> >>>>>>>>> file directly into R. >>> >>>>>>>>> >>> >>>>>>>>> read.delim looks as if it would work directly on the downloaded >>> >>>>>> file. >>> >>>>>>>>> ?read.delim >>> >>>>>>>>> The notation "\t" means the tab character. >>> >>>>>>>>> >>> >>>>>>>>> As an aside, stay away from notepad. it is too naive for almost >>> >>>>>>>>> anything interesting. >>> >>>>>>>>> The specific case I often see is people reading linux-style >>> text >>> >>>>>> files >>> >>>>>>>>> with notepad, which doesn't >>> >>>>>>>>> understand NL terminated lines. nicely formatted text files >>> >>>> become >>> >>>>>>>>> illegible. >>> >>>>>>>>> >>> >>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett >>> >>>>>>>>> <spbracket...@saintjosephhs.com> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>> Good evening, >>> >>>>>>>>>> >>> >>>>>>>>>> I am attempting to anaylze the protein expression data >>> >>>> contained >>> >>>>>> within >>> >>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for >>> >>>> LGG) >>> >>>>>>>>>> >>> >>>>>>>>>> *File for GBM protein expression*: >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D >>> >>>>>>>>>> >>> >>>>>>>>>> *File for LGG protein expression:* >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> * >>> >>>>>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D >>> >>>>>>>>>> < >>> >>>>>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>> >>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D >>> >>>>>>>>>> * >>> >>>>>>>>>> >>> >>>>>>>>>> When I tried to transfer the files from .txt (via Notepad) >>> >>>> to >>> >>>>>> .csv >>> >>>>>>>>> (via >>> >>>>>>>>>> Excel), the data appeared in the columns as unorganized and >>> >>>>>> random >>> >>>>>>>>>> script... not like how a typical csv should be arranged at >>> >>>> all. I >>> >>>>>> need >>> >>>>>>>>> the >>> >>>>>>>>>> dataset to be converted into .csv in order to analyze it in R, >>> >>>>>> which >>> >>>>>>>>> is why >>> >>>>>>>>>> I am hoping someone here might help me in doing that. If not, >>> >>>> is >>> >>>>>> there >>> >>>>>>>>>> perhaps some other way that I could analyze the datatsets on >>> >>>> R, >>> >>>>>> which >>> >>>>>>>>> again >>> >>>>>>>>>> is downloaded from the dataportal ICGC? >>> >>>>>>>>>> >>> >>>>>>>>>> Best, >>> >>>>>>>>>> >>> >>>>>>>>>> Spencer Brackett >>> >>>>>>>>>> >>> >>>>>>>>>> [[alternative HTML version deleted]] >>> >>>>>>>>>> >>> >>>>>>>>>> ______________________________________________ >>> >>>>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, >>> >>>> see >>> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>>>>>>>>> PLEASE do read the posting guide >>> >>>>>>>>> http://www.R-project.org/posting-guide.html >>> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible >>> >>>>>> code. >>> >>>>>>>>> >>> >>>>>>>> >>> >>>>>> >>> >>>>>> [[alternative HTML version deleted]] >>> >>>>>> >>> >>>>>> ______________________________________________ >>> >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>> >>>>>> PLEASE do read the posting guide >>> >>>>>> http://www.R-project.org/posting-guide.html >>> >>>>>> and provide commented, minimal, self-contained, reproducible code. >>> >>>>> >>> >>>>> -- >>> >>>>> Sent from my phone. Please excuse my brevity. >>> >>>>> >>> >>> >>> >>> -- >>> >>> Sent from my phone. Please excuse my brevity. >>> >>> >>> >> >>> >> [[alternative HTML version deleted]] >>> >> >>> >> ______________________________________________ >>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >> https://stat.ethz.ch/mailman/listinfo/r-help >>> >> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> >> and provide commented, minimal, self-contained, reproducible code. >>> > >>> > ______________________________________________ >>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.