Please keep the list included in the thread (e.g. reply-all?). I looked at the file and agree that it looks like xml with a utf8 byte order mark and Unix line endings, which means it is not XLS and it is not XLSX (which is a zipped directory of xml files with DOS line endings). Excel complains but manages to open the file if it has the XLS extension, but I am not aware that any of the usual R Excel packages will understand this file.
The byte order mark can be addressed by opening the file with encoding="UTF-8-BOM", but as you mentioned originally the XML structure is still broken (c.f. the error message about the Style ending tag). Line 16 seems to use /Style rather than /ss:Style. Maybe library(XML) txt <- readLines( fname, encoding="UTF-8-BOM" ) txt <- sub( "</Style>", "</ss:Style>", txt ) fnamenobom <- "nobom.xml" xmlfile <- xmlTreeParse( "nobom.xml" ) -- Sent from my phone. Please excuse my brevity. On July 28, 2016 8:26:44 AM PDT, "Bos, Roger" <roger....@rothschild.com> wrote: >Jeff, > >Thanks for your suggestions. I mentioned XLS because that is the >extension the ishares website provides. I have tried many packages >such as xml, xml2, XLConnect, and readxl. I am not even sure what data >format the file is, but I looks to me like XML and the extension is >XLS. If you have the names of specific packages you think I should >try, that would be very helpful. > >Thanks, > >Roger > > > > > >*************************************************************** >This message and any attachments are for the intended recipient's use >only. >This message may contain confidential, proprietary or legally >privileged >information. No right to confidential or privileged treatment >of this message is waived or lost by an error in transmission. >If you have received this message in error, please immediately >notify the sender by e-mail, delete the message, any attachments and >all >copies from your system and destroy any hard copies. You must >not, directly or indirectly, use, disclose, distribute, >print or copy any part of this message or any attachments if you are >not >the intended recipient. > > >-----Original Message----- >From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us] >Sent: Thursday, July 28, 2016 10:34 AM >To: Bos, Roger; r-help@r-project.org >Subject: Re: [R] problems reading XML type file from ishares website > >XLS has nothing to do with XML. The shift from XLS to XLSX/XLSM formats >was where XML was introduced. You might occasionally find mislabelled >files that seem to work anyway, but there is a significant difference >inside true XLS files. > >Use a package designed to handle your data format. There are a few, and >most seem to require external software support (e.g. Perl or Java or >Windows OS), so you have to decide what overhead support headaches you >can tolerate. >-- >Sent from my phone. Please excuse my brevity. > >On July 28, 2016 6:14:28 AM PDT, "Bos, Roger" ><roger....@rothschild.com> wrote: >>The ishares website has the S&P 500 stocks you can download as a XLS >>file, which opens fine in Excel, but I am not able to open it in R due >>to what seems to be invalid XML formatting. I tried using XLConnect >>and XML as shown below. Does anyone know a workaround or can point >out >>what I am doing wrong. Here is my reproducible code: >> >>temp <- "https://www.ishares.com/us/239726/fund-download.dl" >>fname <- "ivv.xls" >>download.file(url = temp, destfile = fname) >>readWorksheetFromFile(fname) >>library(XML) >>xmlfile <- xmlTreeParse(fname) >> >>09:06:17 > readWorksheetFromFile(fname) >>Error: InvalidFormatException (Java): Your InputStream was neither an >>OLE2 stream, nor an OOXML stream >>09:06:17 > library(XML) >>09:06:25 > xmlfile <- xmlTreeParse(fname) Opening and ending tag >>mismatch: Style line 14 and Style >>Error: 1: Opening and ending tag mismatch: Style line 14 and Style >> >> >>Thanks in advance, Roger >> >> >> >> >> >> >> >>This message and any attachments are for the intended recipient's use >>only. >> >>This message may contain confidential, proprietary or legally >>privileged >> >>information. No right to confidential or privileged treatment >> >>of this message is waived or lost by an error in transmission. >> >>If you have received this message in error, please immediately >> >>notify the sender by e-mail, delete the message, any attachments and >>all >> >>copies from your system and destroy any hard copies. You must >> >>not, directly or indirectly, use, disclose, distribute, >> >>print or copy any part of this message or any attachments if you are >>not >> >>the intended recipient. >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.