>From looking at be CLConnect package, its loadWorkbook() function only >supports reading from local file path, so you might need a way to call HDFS >command to get the file from HDFS first.
SparkR currently does not support this - you could read it in as a text file (I don't think .xlsx is a text format though), collect to get all the data at the driver, then save to local path perhaps? On Wed, Jul 20, 2016 at 3:48 AM -0700, "Rabin Banerjee" <dev.rabin.baner...@gmail.com<mailto:dev.rabin.baner...@gmail.com>> wrote: Hi Yogesh , I have never tried reading XLS files using Spark . But I think you can use sc.wholeTextFiles to read the complete xls at once , as xls files are xml internally, you need to read them all to parse . Then I think you can use apache poi to read them . Also, you can copy you XLS data to a MS-Access file to access via JDBC , Regards, Rabin Banerjee On Wed, Jul 20, 2016 at 2:12 PM, Yogesh Vyas <informy...@gmail.com<mailto:informy...@gmail.com>> wrote: Hi, I am trying to load and read excel sheets from HDFS in sparkR using XLConnect package. Can anyone help me in finding out how to read xls files from HDFS in sparkR ? Regards, Yogesh --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>