As you can see from my reply below from Jan 6, calling sparkR.stop() invalidates both sc and hivecontext you have and results in this invalid jobj error. If you start R and run this, it should work: Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))library(SparkR) sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") Is there a reason you want to call stop? If you do, you would need to call the line hivecontext <- sparkRHive.init(sc) again.
_____________________________ From: Sandeep Khurana <sand...@infoworks.io> Sent: Tuesday, January 12, 2016 5:20 AM Subject: Re: sparkR ORC support. To: Felix Cheung <felixcheun...@hotmail.com> Cc: spark users <user@spark.apache.org>, Prem Sure <premsure...@gmail.com>, Deepak Sharma <deepakmc...@gmail.com>, Yanbo Liang <yblia...@gmail.com> It worked for sometime. Then I did sparkR.stop() an re-ran again to get the same error. Any idea why it ran fine before ( while running fine it kept giving warning reusing existing spark-context and that I should restart) ? There is one more R code which instantiated spark , I ran that too again. On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sand...@infoworks.io> wrote: Complete stacktrace is. Can it be something wih java versions? stop("invalid jobj ", value$id) 8 writeJobj(con, object) 7 writeObject(con, a) 6 writeArgs(rc, args) 5 invokeJava(isStatic = TRUE, className, methodName, ...) 4 callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options) 3 read.df(sqlContext, path, source, schema, ...) 2 loadDF(hivecontext, filepath, "orc") On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sand...@infoworks.io> wrote: Running this gave 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : invalid jobj 3 How does it know which hive schema to connect to? On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: It looks like you have overwritten sc. Could you try this: Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <- sparkR.init() hivecontext <- sparkRHive.init(sc) df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") Date: Tue, 12 Jan 2016 14:28:58 +0530 Subject: Re: sparkR ORC support. From: sand...@infoworks.io To: felixcheun...@hotmail.com CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com; deepakmc...@gmail.com The code is very simple, pasted below . hive-site.xml is in spark conf already. I still see this error Error in writeJobj(con, object) : invalid jobj 3 after running the script below script ======= Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library(SparkR) sc <<- sparkR.init() sc <<- sparkRHive.init() hivecontext <<- sparkRHive.init(sc) df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") #View(df) On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com> wrote: Yes, as Yanbo suggested, it looks like there is something wrong with the sqlContext. Could you forward us your code please? On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com> wrote: You should ensure your sqlContext is HiveContext. sc <- sparkR.init() sqlContext <- sparkRHive.init(sc) 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>: Felix I tried the option suggested by you. It gave below error. I am going to try the option suggested by Prem . Error in writeJobj(con, object) : invalid jobj 1 8 stop("invalid jobj ", value$id) 7 writeJobj(con, object) 6 writeObject(con, a) 5 writeArgs(rc, args) 4 invokeJava(isStatic = TRUE, className, methodName, ...) 3 callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, source, options) 2 read.df(sqlContext, filepath, "orc") at spark_api.R#108 On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: Firstly I don't have ORC data to verify but this should work: df <- loadDF(sqlContext, "data/path", "orc") Secondly, could you check if sparkR.stop() was called? sparkRHive.init() should be called after sparkR.init() - please check if there is any error message there. _____________________________ From: Prem Sure < premsure...@gmail.com> Sent: Tuesday, January 5, 2016 8:12 AM Subject: Re: sparkR ORC support. To: Sandeep Khurana < sand...@infoworks.io> Cc: spark users < user@spark.apache.org>, Deepak Sharma < deepakmc...@gmail.com> Yes Sandeep, also copy hive-site.xml too to spark conf directory. On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io> wrote: Also, do I need to setup hive in spark as per the link http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ? We might need to copy hdfs-site.xml file to spark conf directory ? On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io> wrote: Deepak Tried this. Getting this error now rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("") On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com> wrote: Hi Sandeep can you try this ? results <- sql(hivecontext, "FROM test SELECT id","") Thanks Deepak On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io> wrote: Thanks Deepak. I tried this as well. I created a hivecontext with "hivecontext <<- sparkRHive.init(sc) " . When I tried to read hive table from this , results <- sql(hivecontext, "FROM test SELECT id") I get below error, Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If SparkR was restarted, Spark operations need to be re-executed. Not sure what is causing this? Any leads or ideas? I am using rstudio. On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com> wrote: Hi Sandeep I am not sure if ORC can be read directly in R. But there can be a workaround .First create hive table on top of ORC files and then access hive table in R. Thanks Deepak On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io> wrote: Hello I need to read an ORC files in hdfs in R using spark. I am not able to find a package to do that. Can anyone help with documentation or example for this purpose? -- Architect Infoworks.io http://Infoworks.io -- Thanks Deepak www.bigdatabig.com www.keosha.net -- Architect Infoworks.io http://Infoworks.io -- Thanks Deepak www.bigdatabig.com www.keosha.net -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io -- Architect Infoworks.io http://Infoworks.io