Hi Dennis That those files are in a directory/folder suggests that they were extracted from their zip (.xlsx) file. The following are the basic contents of the .xlsx file 1484 02-28-11 12:48 [Content_Types].xml 733 02-28-11 12:48 _rels/.rels 972 02-28-11 12:48 xl/_rels/workbook.xml.rels 846 02-28-11 12:48 xl/workbook.xml 940 02-28-11 12:48 xl/styles.xml 1402 02-28-11 12:48 xl/worksheets/sheet2.xml 7562 02-28-11 12:48 xl/theme/theme1.xml 1888 02-28-11 12:48 xl/worksheets/sheet1.xml 470 02-28-11 12:48 xl/sharedStrings.xml 196 02-28-11 12:48 xl/calcChain.xml 21316 02-28-11 12:48 docProps/thumbnail.jpeg 629 02-28-11 12:48 docProps/core.xml 828 02-28-11 12:48 docProps/app.xml If most of these are present, I would explore whether the sender could give them to you without unzipping them or make sure that your software isn't automatically unzipping them for you.
Note that not all files in the .xlsx are sheets and the WorkSheet is the basic entity that corresponds to a .csv file. The xlsx package and my REXcelXML packages will probably get you a fair bit of the way in extracting the content, but they probably will need some tinkering since they expect the different components to be in a zip archive. There is also an office2010 package which seems to have an overlap with what is in xlsx, and ROOXML, RWordXML and RExcelXML. D. On 8/10/11 7:26 AM, Dennis Fisher wrote: > R version 2.13.1 > OS X (or Windows) > > Colleagues, > > I received a number of files with a .xls extension. These files open in XL > and, by all appearances, are XL files. However, it appears to me that the > files are actually XML: > >> readLines(dir()[16])[1:10] > [1] "<?xml version=\"1.0\"?>" > > [2] "<Workbook xmlns=\"urn:schemas-microsoft-com:office:spreadsheet\"" > > [3] " xmlns:o=\"urn:schemas-microsoft-com:office:office\"" > > [4] " xmlns:x=\"urn:schemas-microsoft-com:office:excel\"" > > [5] " xmlns:ss=\"urn:schemas-microsoft-com:office:spreadsheet\"" > > [6] " xmlns:html=\"http://www.w3.org/TR/REC-html40\">" > > [7] " <DocumentProperties > xmlns=\"urn:schemas-microsoft-com:office:office\">" > [8] " <Version>12.0</Version>" > > [9] " </DocumentProperties>" > > [10] " <OfficeDocumentSettings > xmlns=\"urn:schemas-microsoft-com:office:office\">" > > I had initially tried to read the files using read.xls (gdata) but that > failed (not surprisingly). I could open each Excel file, then "save as" csv, > then use read.csv. However, there are many files so I would love to have a > solution that does not require this brute force approach. > > Are there any packages that would allow me to read these files without the > additional steps? > > Dennis > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.