It worked for sometime. Then I did sparkR.stop() an re-ran again to get the same error. Any idea why it ran fine before ( while running fine it kept giving warning reusing existing spark-context and that I should restart) ? There is one more R code which instantiated spark , I ran that too again.
On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sand...@infoworks.io> wrote: > Complete stacktrace is. Can it be something wih java versions? > > > stop("invalid jobj ", value$id) > 8 > writeJobj(con, object) > 7 > writeObject(con, a) > 6 > writeArgs(rc, args) > 5 > invokeJava(isStatic = TRUE, className, methodName, ...) > 4 > callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, > source, options) > 3 > read.df(sqlContext, path, source, schema, ...) > 2 > loadDF(hivecontext, filepath, "orc") > > On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sand...@infoworks.io> > wrote: > >> Running this gave >> >> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in >> writeJobj(con, object) : invalid jobj 3 >> >> >> How does it know which hive schema to connect to? >> >> >> >> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >>> It looks like you have overwritten sc. Could you try this: >>> >>> >>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") >>> >>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), >>> .libPaths())) >>> library(SparkR) >>> >>> sc <- sparkR.init() >>> hivecontext <- sparkRHive.init(sc) >>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") >>> >>> >>> >>> ------------------------------ >>> Date: Tue, 12 Jan 2016 14:28:58 +0530 >>> Subject: Re: sparkR ORC support. >>> From: sand...@infoworks.io >>> To: felixcheun...@hotmail.com >>> CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com; >>> deepakmc...@gmail.com >>> >>> >>> The code is very simple, pasted below . >>> hive-site.xml is in spark conf already. I still see this error >>> >>> Error in writeJobj(con, object) : invalid jobj 3 >>> >>> after running the script below >>> >>> >>> script >>> ======= >>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") >>> >>> >>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), >>> .libPaths())) >>> library(SparkR) >>> >>> sc <<- sparkR.init() >>> sc <<- sparkRHive.init() >>> hivecontext <<- sparkRHive.init(sc) >>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") >>> #View(df) >>> >>> >>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com >>> > wrote: >>> >>> Yes, as Yanbo suggested, it looks like there is something wrong with the >>> sqlContext. >>> >>> Could you forward us your code please? >>> >>> >>> >>> >>> >>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com> >>> wrote: >>> >>> You should ensure your sqlContext is HiveContext. >>> >>> sc <- sparkR.init() >>> >>> sqlContext <- sparkRHive.init(sc) >>> >>> >>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>: >>> >>> Felix >>> >>> I tried the option suggested by you. It gave below error. I am going >>> to try the option suggested by Prem . >>> >>> Error in writeJobj(con, object) : invalid jobj 1 >>> 8 >>> stop("invalid jobj ", value$id) >>> 7 >>> writeJobj(con, object) >>> 6 >>> writeObject(con, a) >>> 5 >>> writeArgs(rc, args) >>> 4 >>> invokeJava(isStatic = TRUE, className, methodName, ...) >>> 3 >>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext, >>> source, options) >>> 2 >>> read.df(sqlContext, filepath, "orc") at >>> spark_api.R#108 >>> >>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com >>> > wrote: >>> >>> Firstly I don't have ORC data to verify but this should work: >>> >>> df <- loadDF(sqlContext, "data/path", "orc") >>> >>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init() >>> should be called after sparkR.init() - please check if there is any error >>> message there. >>> >>> _____________________________ >>> From: Prem Sure <premsure...@gmail.com> >>> Sent: Tuesday, January 5, 2016 8:12 AM >>> Subject: Re: sparkR ORC support. >>> To: Sandeep Khurana <sand...@infoworks.io> >>> Cc: spark users <user@spark.apache.org>, Deepak Sharma < >>> deepakmc...@gmail.com> >>> >>> >>> >>> Yes Sandeep, also copy hive-site.xml too to spark conf directory. >>> >>> >>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io> >>> wrote: >>> >>> Also, do I need to setup hive in spark as per the link >>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark >>> ? >>> >>> We might need to copy hdfs-site.xml file to spark conf directory ? >>> >>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io> >>> wrote: >>> >>> Deepak >>> >>> Tried this. Getting this error now >>> >>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : >>> unused argument ("") >>> >>> >>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com> >>> wrote: >>> >>> Hi Sandeep >>> can you try this ? >>> >>> results <- sql(hivecontext, "FROM test SELECT id","") >>> >>> Thanks >>> Deepak >>> >>> >>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io> >>> wrote: >>> >>> Thanks Deepak. >>> >>> I tried this as well. I created a hivecontext with "hivecontext <<- >>> sparkRHive.init(sc) " . >>> >>> When I tried to read hive table from this , >>> >>> results <- sql(hivecontext, "FROM test SELECT id") >>> >>> I get below error, >>> >>> Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj 2. If >>> SparkR was restarted, Spark operations need to be re-executed. >>> >>> >>> Not sure what is causing this? Any leads or ideas? I am using rstudio. >>> >>> >>> >>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com> >>> wrote: >>> >>> Hi Sandeep >>> I am not sure if ORC can be read directly in R. >>> But there can be a workaround .First create hive table on top of ORC >>> files and then access hive table in R. >>> >>> Thanks >>> Deepak >>> >>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io> >>> wrote: >>> >>> Hello >>> >>> I need to read an ORC files in hdfs in R using spark. I am not able to >>> find a package to do that. >>> >>> Can anyone help with documentation or example for this purpose? >>> >>> -- >>> Architect >>> Infoworks.io <http://infoworks.io> >>> http://Infoworks.io >>> >>> >>> >>> >>> -- >>> Thanks >>> Deepak >>> www.bigdatabig.com >>> www.keosha.net >>> >>> >>> >>> >>> -- >>> Architect >>> Infoworks.io <http://infoworks.io> >>> http://Infoworks.io >>> >>> >>> >>> >>> -- >>> Thanks >>> Deepak >>> www.bigdatabig.com >>> www.keosha.net >>> >>> >>> >>> >>> -- >>> Architect >>> Infoworks.io <http://infoworks.io> >>> http://Infoworks.io >>> >>> >>> >>> >>> -- >>> Architect >>> Infoworks.io <http://infoworks.io> >>> http://Infoworks.io >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Architect >>> Infoworks.io >>> http://Infoworks.io >>> >>> >>> >>> >>> >>> -- >>> Architect >>> Infoworks.io >>> http://Infoworks.io >>> >> >> >> >> -- >> Architect >> Infoworks.io >> http://Infoworks.io >> > > > > -- > Architect > Infoworks.io > http://Infoworks.io > -- Architect Infoworks.io http://Infoworks.io