Re: sparkR ORC support.

Felix Cheung Tue, 12 Jan 2016 07:02:13 -0800

As you can see from my reply below from Jan 6, calling sparkR.stop() 
invalidates both sc and hivecontext you have and results in this invalid jobj 
error.
If you start R and run this, it should work:
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), 
.libPaths()))library(SparkR)
sc <- sparkR.init()hivecontext <- sparkRHive.init(sc)df <- loadDF(hivecontext, 
"/data/ingest/sparktest1/", "orc") 
Is there a reason you want to call stop? If you do, you would need to call the 
line hivecontext <- sparkRHive.init(sc) again.




    _____________________________
From: Sandeep Khurana <sand...@infoworks.io>
Sent: Tuesday, January 12, 2016 5:20 AM
Subject: Re: sparkR ORC support.
To: Felix Cheung <felixcheun...@hotmail.com>
Cc: spark users <user@spark.apache.org>, Prem Sure <premsure...@gmail.com>, 
Deepak Sharma <deepakmc...@gmail.com>, Yanbo Liang <yblia...@gmail.com>


       It worked for sometime. Then I did  sparkR.stop() an re-ran again to get 
the same error. Any idea why it ran fine before ( while running fine it kept 
giving warning reusing existing spark-context and that I should restart) ? 
There is one more R code which instantiated spark , I ran that too again.       
               
             On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana        
<sand...@infoworks.io> wrote:       
                        Complete stacktrace is. Can it be something wih java 
versions?                    
                     
                                                                                
      stop("invalid jobj ", value$id)                                           
                                               8                                
                    writeJobj(con, object)                                      
                                                7                               
                     writeObject(con, a)                                        
                                              6                                 
                   writeArgs(rc, args)                                          
                                            5                                   
                                invokeJava(isStatic = TRUE, className, 
methodName, ...)                                                                
                         4                                                      
             callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", 
sqlContext, source, options)                                                    
                                     3                                          
          read.df(sqlContext, path, source, schema, ...)                        
                                                              2                 
                                   loadDF(hivecontext, filepath, "orc")         
                                                                                
            
                       On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana         
    <sand...@infoworks.io> wrote:            
                                       Running this gave                        
     
                                           16/01/12 04:06:54 INFO 
BlockManagerMaster: Registered BlockManagerError in writeJobj(con, object) : 
invalid jobj 3               
               How does it know which hive schema to connect to?
                                           
                                                                                
     
                                 On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung  
                <felixcheun...@hotmail.com> wrote:                 
                                                                           It 
looks like you have overwritten sc. Could you try this:                    
 
 
                                           
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")                          
                                      
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))      
                                                          library(SparkR)       
                                                         
sc <- sparkR.init()                                                             
                       hivecontext <- sparkRHive.init(sc)                       
                                         df <- loadDF(hivecontext, 
"/data/ingest/sparktest1/", "orc")                      

 
                                         Date: Tue, 12 Jan 2016 14:28:58 +0530  
                   
Subject: Re: sparkR ORC support.
From:                      sand...@infoworks.io                     
To:                      felixcheun...@hotmail.com                     
CC:                      yblia...@gmail.com;                      
user@spark.apache.org;                      premsure...@gmail.com;              
        deepakmc...@gmail.com                                                   
               
                       
                                               The code is very simple, pasted 
below .                                                   hive-site.xml is in 
spark conf already. I still see this error                                      
                                                                     Error in 
writeJobj(con, object) : invalid jobj 3                                         
                                                                                
                                                                                
                                         after running the script  below        
                                           
                                                                            
                                                                            
script                                                                          
  =======                                                                       
                                
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")                          
                                                     
                                                                               
                                                                               
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))      
                                                                         
library(SparkR)                                                                 
              
                                                                               
sc <<- sparkR.init()                                                            
                   sc <<- sparkRHive.init()                                     
                                          hivecontext <<- sparkRHive.init(sc)   
                                                                            df 
<- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")                       
                                                        #View(df)               
                                                                                
       
                                                                                
                                       
                                                 On Wed, Jan 6, 2016 at 11:08 
PM, Felix Cheung                          <felixcheun...@hotmail.com> wrote:    
                     
                                                                                
                            Yes, as Yanbo suggested, it looks like there is 
something wrong with the sqlContext.                                            
                                       
                                                                                
    Could you forward us your code please?                            
                             
                                                          
                                                                                
                                 
                              
                              
                                                            On Wed, Jan 6, 2016 
at 5:52 AM -0800, "Yanbo Liang"                                
<yblia...@gmail.com> wrote:                              
                               
                                                                                
                                          You should ensure your sqlContext is 
HiveContext.                                                                 sc 
<- sparkR.init()                                 sqlContext <- 
sparkRHive.init(sc)                                                             
                                                                
                                                                2016-01-06 
20:35 GMT+08:00 Sandeep Khurana                                  
<sand...@infoworks.io>:                                
                                                                                
                     Felix                                                      
                
                                                                                
                         I tried the option suggested by you.  It gave below 
error.  I am going to try the option suggested by Prem .                        
                                                                                
                                                                                
                                                               Error in 
writeJobj(con, object) : invalid jobj 1                                         
                                                                                
                                                                                
                    8                                                           
                                                                                
              stop("invalid jobj ", value$id)                                   
                                                                                
                                                                                
                                                                      7         
                                                                                
                                                                writeJobj(con, 
object)                                                                         
                                                                                
                                                                                
                                6                                               
                                                                                
                          writeObject(con, a)                                   
                                                                                
                                                                                
                                                                      5         
                                                                                
                                                                writeArgs(rc, 
args)                                                                           
                                                                                
                                                                                
                              4                                                 
                                                                                
                        invokeJava(isStatic = TRUE, className, methodName, ...) 
                                                                                
                                                                                
                                                                                
                        3                                                       
                                                                                
                  callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", 
sqlContext, source, options)                                                    
                                                                                
                                                                                
                                                     2                          
                                                                                
                                               read.df(sqlContext, filepath, 
"orc") at                                                                       
      spark_api.R#108                                                           
                                                                                
                                                                                
                                                                                
                     
                                                                          On 
Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung                                      
 <felixcheun...@hotmail.com> wrote:                                     
                                                                                
                                                                             
Firstly I don't have ORC data to verify but this should work:                   
                                                                                
                    
                                                                                
                                        df <- loadDF(sqlContext, "data/path", 
"orc")                                                                          
                                             
                                                                                
                                        Secondly, could you check if 
sparkR.stop() was called? sparkRHive.init() should be called after 
sparkR.init() - please check if there is any error message there.               
                         
                                                                                
  
                                                                                
                                                                             
_____________________________                                       
 From: Prem Sure <                                       premsure...@gmail.com> 
                                      
 Sent: Tuesday, January 5, 2016 8:12 AM                                       
 Subject: Re: sparkR ORC support.                                       
 To: Sandeep Khurana <                                       
sand...@infoworks.io>                                       
 Cc: spark users <                                       
user@spark.apache.org>, Deepak Sharma <                                       
deepakmc...@gmail.com>                                                          
                                                                
                                          
                                          
                                                                                
    Yes Sandeep, also copy hive-site.xml too to spark conf directory.           
                                
                                           
                                                                                
                                              
                                                                                
      On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana                          
                   <sand...@infoworks.io> wrote:                                
            
                                                                                
                                                      Also, do I need to setup 
hive in spark as per the link                                                
http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ?     
                                                                                
       
                                                                                
                                                          We might need to copy 
hdfs-site.xml file to spark conf directory ?                                    
                                                                                
                                                                                
                                                                                
                                                                                
                On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana                 
                                 <sand...@infoworks.io> wrote:                  
                               
                                                                                
                                                                     Deepak     
                                                                                
                 
                                                                                
                                                                         Tried 
this. Getting this error now                                                    
                                                                                
                      rror in sql(hivecontext, "FROM CATEGORIES SELECT 
category_id", "") :   unused argument ("")                                      
                                                                                
                                                                                
                                                                                
                            
                                                                                
                          On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma         
                                              <deepakmc...@gmail.com> wrote:    
                                                  
                                                                                
                                                                                
                                                             Hi Sandeep         
                                                                                
                                                                               
can you try this ?                                                              
                                                                                
                          
                                                                                
                                                                                
                                                                  results <- 
sql(hivecontext, "FROM test SELECT id","")                                      
                                                                                
                                                     
                                                                                
                                                                                
           Thanks                                                               
                                                                                
                                                                                
         Deepak                                                                 
                                                                                
                                                                                
  
                                                                                
                                                                                
                                                                                
                                                                                
                
                                                                                
                                    On Tue, Jan 5, 2016 at 5:49 PM, Sandeep 
Khurana                                                            
<sand...@infoworks.io> wrote:                                                   
        
                                                                                
                                                                                
                   Thanks Deepak.                                               
                                                                           
                                                                                
                                                                                
                       I tried this as well. I created a hivecontext   with  
"hivecontext <<- sparkRHive.init(sc) "  .                                       
                                                                                
                                                                
                                                                                
                                                                                
                       When I tried to read hive table from this ,              
                                                                                
                                                                                
          
                                                                                
                                                                                
                       results <- sql(hivecontext, "FROM test SELECT id")       
                                                                                
                                                                                
                
                                                                                
                                                                                
                       I get below error,                                       
                                                                                
                                                                 
                                                                                
                                                                                
                        Error in callJMethod(sqlContext, "sql", sqlQuery) :   
Invalid jobj 2. If SparkR was restarted, Spark operations need to be 
re-executed.                                                              
                                                              Not sure what is 
causing this? Any leads or ideas? I am using rstudio.                           
                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                              On Tue, Jan 5, 2016 at 5:35 PM, 
Deepak Sharma                                                                 
<deepakmc...@gmail.com> wrote:                                                  
              
                                                                                
                                                                                
                                                                                
                     Hi Sandeep                                                 
                                                                                
                                                                     I am not 
sure if ORC can be read directly in R.                                          
                                                                                
                                                                            But 
there can be a workaround .First create hive table on top of ORC files and then 
access hive table in R.                                                         
                                                                                
                                                             
                                                                                
                                                                                
                                      Thanks                                    
                                                                                
                                                                                
  Deepak                                                                        
                                                                                
                                                                                
                                                                                
                                                                                
    
                                                                                
                                                        On Tue, Jan 5, 2016 at 
4:57 PM, Sandeep Khurana                                                        
              <sand...@infoworks.io> wrote:                                     
                                
                                                                                
                                                                                
                                                 Hello                          
                                                                                
                                    
                                                                                
                                                                                
                                                     I need to read an ORC 
files in hdfs in R using spark. I am not able to find a package to do that.     
                                                                                
                                                                                
                                                 
                                                                                
                                                                                
                                                     Can anyone help with 
documentation or example for this purpose?                                      
                                  
                                                                                
                                                                    
                                                                           -- 
                                                                                
                                                                                
                                                                                
                                                                                
                                                           Architect            
                                                                  
                                                                                
                                                                          
Infoworks.io                                                                    
         
                                                                                
                                                                        
http://Infoworks.io                                                             
               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
               
                                                                    
                                                                    
                                                                                
                                                                                
                                       -- 
                                                                                
                                                                                
                                              Thanks                            
                                          
 Deepak                                                                      
                                                                      
www.bigdatabig.com                                                              
        
                                                                      
www.keosha.net                                                                  
                                                                                
                                                                                
                                                                                
                                                                                
      
                                                               
                                                                                
                                              
                                                                --              
                                                 
                                                                                
                                                                                
                                                                                
                                                                                
    Architect                                                                   
                                                                                
                                                    Infoworks.io                
                                                  
                                                                                
                                                  http://Infoworks.io           
                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
     
                                                          
                                                          
 --                                                          
                                                                                
                                                                                
                Thanks                                                          
  
 Deepak                                                            
                                                            www.bigdatabig.com  
                                                          
                                                            www.keosha.net      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       
                                                     
                                                                                
                          
                                                      --                        
                             
                                                                                
                                                                                
                                                                                
                                  Architect                                     
                    
                                                                                
                                Infoworks.io                                    
                    
                                                                                
                              http://Infoworks.io                               
                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
     
                                                
                                                                                
                
                                                 --                             
                   
                                                                                
                                                                                
                                                                                
         Architect                                                    
                                                                                
                      Infoworks.io                                              
     
                                                                                
                    http://Infoworks.io                                         
         
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
    
                                          
                                                                                
                                                                                
                                                                        
                                     
                                                                          
                                      --                                     
                                                                                
                                                                                
                                  Architect                                     
   
                                         Infoworks.io                           
            
                                                                              
http://Infoworks.io                                      
                                                                                
                                                                                
                                                                                
                                     
                                                                                
                                                                                
                                                         
                        
                                                 
                        --                         
                                                                                
                                                  Architect                     
       
                           Infoworks.io                           
                                                    http://Infoworks.io         
                 
                                                                                
                                                                                
                                                                
                
                                 
                --                 
                                                                                
          Architect                    
                   Infoworks.io                   
                                    http://Infoworks.io                  
                                                                                
                               
           
                       
           --            
                                                                 Architect      
         
              Infoworks.io              
                          http://Infoworks.io             
                                                                       
      
             
      --       
                                        Architect          
         Infoworks.io         
                http://Infoworks.io

Re: sparkR ORC support.

Reply via email to