To my knowledge, Spark 1.1 comes with HBase 0.94 To utilize HBase 0.98, you will need: https://issues.apache.org/jira/browse/SPARK-1297
You can apply the patch and build Spark yourself. Cheers On Wed, Nov 12, 2014 at 12:57 PM, Alan Prando <a...@scanboo.com.br> wrote: > Hi Ted! Thanks for anwsering... > > Maybe I didn't make myself clear... What I need is read a table from HBase > using Python in Spark. > I'm using HBase 0.98 and Spark 1.1 > > My code is as following: > https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py > My problem is that, when I have two (or more) qualifiers in a rowkey, this > example return just one qualifier. > > In fact, I've already find a question similar ( > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-td18613.html#a18650), > however I'm not able yet to find the solution. > > Do u have any idea? > > > 2014-11-12 18:26 GMT-02:00 Ted Yu <yuzhih...@gmail.com>: > > Can you give us a bit more detail: >> >> hbase release you're using. >> whether you can reproduce using hbase shell. >> >> I did the following using hbase shell against 0.98.4: >> >> hbase(main):001:0> create 'test', 'f1' >> 0 row(s) in 2.9140 seconds >> >> => Hbase::Table - test >> hbase(main):002:0> put 'test', 'row1', 'f1:1', 'value1' >> 0 row(s) in 0.1040 seconds >> >> hbase(main):003:0> put 'test', 'row1', 'f1:2', 'value2' >> 0 row(s) in 0.0080 seconds >> >> hbase(main):004:0> scan 'test' >> ROW COLUMN+CELL >> row1 column=f1:1, >> timestamp=1415823887048, value=value1 >> row1 column=f1:2, >> timestamp=1415823893857, value=value2 >> >> Cheers >> >> On Wed, Nov 12, 2014 at 11:32 AM, Alan Prando <a...@scanboo.com.br> >> wrote: >> >>> Hi all, >>> >>> I'm trying to read an hbase table using this an example from github ( >>> https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py), >>> however I have two qualifiers in a column family. >>> >>> Ex.: >>> >>> ROW COLUMN+CELL row1 column=f1:1, timestamp=1401883411986, >>> value=value1 row1 column=f1:2, timestamp=1401883415212, value=value2 row2 >>> column=f1:1, timestamp=1401883417858, value=value3 row3 column=f1:1, >>> timestamp=1401883420805, value=value4 >>> When I run the code hbase_inputformat.py, the following loop print row1 >>> just once: >>> >>> output = hbase_rdd.collect() for (k, v) in output: print (k, v) >>> Am I doing anything wrong? >>> >>> Thanks in advance. >>> >> >> >