Re: sparkR ORC support.

Sandeep Khurana Tue, 12 Jan 2016 01:36:18 -0800

Complete stacktrace is. Can it be something wih java versions?


stop("invalid jobj ", value$id)
8
writeJobj(con, object)
7
writeObject(con, a)
6
writeArgs(rc, args)
5
invokeJava(isStatic = TRUE, className, methodName, ...)
4
callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
source, options)
3
read.df(sqlContext, path, source, schema, ...)
2
loadDF(hivecontext, filepath, "orc")

On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sand...@infoworks.io>
wrote:

> Running this gave
>
> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in 
> writeJobj(con, object) : invalid jobj 3
>
>
> How does it know which hive schema to connect to?
>
>
>
> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>> It looks like you have overwritten sc. Could you try this:
>>
>>
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <- sparkR.init()
>> hivecontext <- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>
>>
>>
>> ------------------------------
>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>> Subject: Re: sparkR ORC support.
>> From: sand...@infoworks.io
>> To: felixcheun...@hotmail.com
>> CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com;
>> deepakmc...@gmail.com
>>
>>
>> The code is very simple, pasted below .
>> hive-site.xml is in spark conf already. I still see this error
>>
>> Error in writeJobj(con, object) : invalid jobj 3
>>
>> after running the script  below
>>
>>
>> script
>> =======
>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>
>>
>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
>> library(SparkR)
>>
>> sc <<- sparkR.init()
>> sc <<- sparkRHive.init()
>> hivecontext <<- sparkRHive.init(sc)
>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>> #View(df)
>>
>>
>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>> sqlContext.
>>
>> Could you forward us your code please?
>>
>>
>>
>>
>>
>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com>
>> wrote:
>>
>> You should ensure your sqlContext is HiveContext.
>>
>> sc <- sparkR.init()
>>
>> sqlContext <- sparkRHive.init(sc)
>>
>>
>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>:
>>
>> Felix
>>
>> I tried the option suggested by you.  It gave below error.  I am going to
>> try the option suggested by Prem .
>>
>> Error in writeJobj(con, object) : invalid jobj 1
>> 8
>> stop("invalid jobj ", value$id)
>> 7
>> writeJobj(con, object)
>> 6
>> writeObject(con, a)
>> 5
>> writeArgs(rc, args)
>> 4
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 3
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 2
>> read.df(sqlContext, filepath, "orc") at
>> spark_api.R#108
>>
>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>> Firstly I don't have ORC data to verify but this should work:
>>
>> df <- loadDF(sqlContext, "data/path", "orc")
>>
>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>> should be called after sparkR.init() - please check if there is any error
>> message there.
>>
>> _____________________________
>> From: Prem Sure <premsure...@gmail.com>
>> Sent: Tuesday, January 5, 2016 8:12 AM
>> Subject: Re: sparkR ORC support.
>> To: Sandeep Khurana <sand...@infoworks.io>
>> Cc: spark users <user@spark.apache.org>, Deepak Sharma <
>> deepakmc...@gmail.com>
>>
>>
>>
>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>
>>
>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>> Also, do I need to setup hive in spark as per the link
>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>> ?
>>
>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>
>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>> Deepak
>>
>> Tried this. Getting this error now
>>
>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   
>> unused argument ("")
>>
>>
>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> can you try this ?
>>
>> results <- sql(hivecontext, "FROM test SELECT id","")
>>
>> Thanks
>> Deepak
>>
>>
>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>> Thanks Deepak.
>>
>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>> sparkRHive.init(sc) "  .
>>
>> When I tried to read hive table from this ,
>>
>> results <- sql(hivecontext, "FROM test SELECT id")
>>
>> I get below error,
>>
>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If 
>> SparkR was restarted, Spark operations need to be re-executed.
>>
>>
>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
>> wrote:
>>
>> Hi Sandeep
>> I am not sure if ORC can be read directly in R.
>> But there can be a workaround .First create hive table on top of ORC
>> files and then access hive table in R.
>>
>> Thanks
>> Deepak
>>
>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>> Hello
>>
>> I need to read an ORC files in hdfs in R using spark. I am not able to
>> find a package to do that.
>>
>> Can anyone help with documentation or example for this purpose?
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io <http://infoworks.io>
>> http://Infoworks.io
>>
>>
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Reply via email to