[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

Jonathan Hsieh (JIRA) Tue, 23 Feb 2016 16:51:41 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159982#comment-15159982
 ]


Jonathan Hsieh commented on HBASE-15184:
----------------------------------------

I tested the patch on a live kerborized cluster and it works for me.    Here's 
how I did it for folks who'd like to duplicate:

Prereqs:
# must have a kerb enabled cluster (hbase/hdfs/yarn, etc).
# spark must be run in yarn continainers (kerb doesn't work with spark 
standalone mode).

Procedure: 
# Loaded a table with 100k rows.  'hbase ltt -write 5:1000:160 -num_keys 100000 
-tn ltt'
# Granted 'R' access to 'randomuser' user (yarn need to have a user with id 
>1000).  "grant 'randomuser', 'R', 'ltt'" in the hbase shell.
# Started spark-shell with hbase classpath: 'sudo -u randomuser 
SPARK_CLASSPATH=`hbase classpath` spark-shell
# ran these lines in the spark shell
{code}

import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.hadoop.hbase.{TableName, HBaseConfiguration}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Scan
import org.apache.spark.sql.SQLContext
val tableName="ltt"
val hbaseConf = HBaseConfiguration.create()
val hbaseContext = new HBaseContext(sc, hbaseConf)
val scan = new Scan()
scan.setCaching(100)
val getRdd = hbaseContext.hbaseRDD(TableName.valueOf(tableName), scan)
getRdd.foreach(v => println(Bytes.toString(v._1.get())))
println("Length: " + getRdd.map(r => r._1.copyBytes()).collect().length);
{code}
# got 100k count, declare victory



> SparkSQL Scan operation doesn't work on kerberos cluster
> --------------------------------------------------------
>
>                 Key: HBASE-15184
>                 URL: https://issues.apache.org/jira/browse/HBASE-15184
>             Project: HBase
>          Issue Type: Bug
>          Components: spark
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

Reply via email to