HBase table contains the following:
ROW COLUMN+CELL
Product01 column=cf:ProductFeature, timestamp=1487917201238,value=
Feature01
Product01 column=cf:ProductFeature, timestamp=1487917201239,value=
Feature02
Product01 column=cf:ProductFeature, timestamp=1487917201240,value=
Feature03
Product01 column=cf:Price, timestamp=1487917201242,value=\x012A\xF8
Product01 column=cf:Location, timestamp=1487917201244,value= Texas
Here VERSIONS is 3. So it is keeping 3 different values for ProductFeature
column. I wrote the following to create an RDD
val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])
val resultRDD = hbaseRDD.map(tuple => tuple._2)
val testRDD = resultRDD.map(Row.parseRow)
val testDF = testRDD.toDF()
Here, parseRow is a method that returns tuple of
(ROW,ProductFeature,Price,Location). I am only getting
+----------------+----------------+---------+---------+
| Row| ProductFeature| Price| Location|
+----------------+----------------+---------+---------+
| Product01| Feature03| 65| Texas|
+----------------+----------------+---------+---------+
Where do I have to change in the code so that I can create DataFrame for
different values of ProductFeature like the following:
+----------------+----------------+---------+---------+
| Row| ProductFeature| Price| Location|
+----------------+----------------+---------+---------+
| Product01| Feature01| 65| Texas|
+----------------+----------------+---------+---------+
| Product01| Feature02| 65| Texas|
+----------------+----------------+---------+---------+
| Product01| Feature03| 65| Texas|
+----------------+----------------+---------+---------+
--
View this message in context:
http://apache-hbase.679495.n3.nabble.com/Reading-data-for-a-particular-column-cell-with-2-or-more-values-of-a-same-row-key-tp4086420.html
Sent from the HBase User mailing list archive at Nabble.com.