Re: sparkR ORC support.

Sandeep Khurana Tue, 12 Jan 2016 05:21:14 -0800

It worked for sometime. Then I did  sparkR.stop() an re-ran again to get
the same error. Any idea why it ran fine before ( while running fine it
kept giving warning reusing existing spark-context and that I should
restart) ? There is one more R code which instantiated spark , I ran that
too again.



On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sand...@infoworks.io>
wrote:

> Complete stacktrace is. Can it be something wih java versions?
>
>
> stop("invalid jobj ", value$id)
> 8
> writeJobj(con, object)
> 7
> writeObject(con, a)
> 6
> writeArgs(rc, args)
> 5
> invokeJava(isStatic = TRUE, className, methodName, ...)
> 4
> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
> source, options)
> 3
> read.df(sqlContext, path, source, schema, ...)
> 2
> loadDF(hivecontext, filepath, "orc")
>
> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
>> Running this gave
>>
>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in 
>> writeJobj(con, object) : invalid jobj 3
>>
>>
>> How does it know which hive schema to connect to?
>>
>>
>>
>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com>
>> wrote:
>>
>>> It looks like you have overwritten sc. Could you try this:
>>>
>>>
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <- sparkR.init()
>>> hivecontext <- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>
>>>
>>>
>>> ------------------------------
>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>> Subject: Re: sparkR ORC support.
>>> From: sand...@infoworks.io
>>> To: felixcheun...@hotmail.com
>>> CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com;
>>> deepakmc...@gmail.com
>>>
>>>
>>> The code is very simple, pasted below .
>>> hive-site.xml is in spark conf already. I still see this error
>>>
>>> Error in writeJobj(con, object) : invalid jobj 3
>>>
>>> after running the script  below
>>>
>>>
>>> script
>>> =======
>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>
>>>
>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>> .libPaths()))
>>> library(SparkR)
>>>
>>> sc <<- sparkR.init()
>>> sc <<- sparkRHive.init()
>>> hivecontext <<- sparkRHive.init(sc)
>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>> #View(df)
>>>
>>>
>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <felixcheun...@hotmail.com
>>> > wrote:
>>>
>>> Yes, as Yanbo suggested, it looks like there is something wrong with the
>>> sqlContext.
>>>
>>> Could you forward us your code please?
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com>
>>> wrote:
>>>
>>> You should ensure your sqlContext is HiveContext.
>>>
>>> sc <- sparkR.init()
>>>
>>> sqlContext <- sparkRHive.init(sc)
>>>
>>>
>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>:
>>>
>>> Felix
>>>
>>> I tried the option suggested by you.  It gave below error.  I am going
>>> to try the option suggested by Prem .
>>>
>>> Error in writeJobj(con, object) : invalid jobj 1
>>> 8
>>> stop("invalid jobj ", value$id)
>>> 7
>>> writeJobj(con, object)
>>> 6
>>> writeObject(con, a)
>>> 5
>>> writeArgs(rc, args)
>>> 4
>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>> 3
>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>>> source, options)
>>> 2
>>> read.df(sqlContext, filepath, "orc") at
>>> spark_api.R#108
>>>
>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com
>>> > wrote:
>>>
>>> Firstly I don't have ORC data to verify but this should work:
>>>
>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>
>>> Secondly, could you check if sparkR.stop() was called? sparkRHive.init()
>>> should be called after sparkR.init() - please check if there is any error
>>> message there.
>>>
>>> _____________________________
>>> From: Prem Sure <premsure...@gmail.com>
>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>> Subject: Re: sparkR ORC support.
>>> To: Sandeep Khurana <sand...@infoworks.io>
>>> Cc: spark users <user@spark.apache.org>, Deepak Sharma <
>>> deepakmc...@gmail.com>
>>>
>>>
>>>
>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io>
>>> wrote:
>>>
>>> Also, do I need to setup hive in spark as per the link
>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>> ?
>>>
>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>
>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io>
>>> wrote:
>>>
>>> Deepak
>>>
>>> Tried this. Getting this error now
>>>
>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   
>>> unused argument ("")
>>>
>>>
>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> can you try this ?
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>
>>> Thanks
>>> Deepak
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io>
>>> wrote:
>>>
>>> Thanks Deepak.
>>>
>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>> sparkRHive.init(sc) "  .
>>>
>>> When I tried to read hive table from this ,
>>>
>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>
>>> I get below error,
>>>
>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If 
>>> SparkR was restarted, Spark operations need to be re-executed.
>>>
>>>
>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
>>> wrote:
>>>
>>> Hi Sandeep
>>> I am not sure if ORC can be read directly in R.
>>> But there can be a workaround .First create hive table on top of ORC
>>> files and then access hive table in R.
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io>
>>> wrote:
>>>
>>> Hello
>>>
>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>> find a package to do that.
>>>
>>> Can anyone help with documentation or example for this purpose?
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Thanks
>>> Deepak
>>> www.bigdatabig.com
>>> www.keosha.net
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io <http://infoworks.io>
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Architect
>>> Infoworks.io
>>> http://Infoworks.io
>>>
>>
>>
>>
>> --
>> Architect
>> Infoworks.io
>> http://Infoworks.io
>>
>
>
>
> --
> Architect
> Infoworks.io
> http://Infoworks.io
>



-- 
Architect
Infoworks.io
http://Infoworks.io

Re: sparkR ORC support.

Reply via email to