Martin,
1) The first map contains the columns in the primary key, which could be a
compound primary key containing multiple columns, and the second map contains
all the non-key columns.
2) try this fixed code:
val navnrevmap = casRdd.map{
case (key, value) =>
(ByteBufferUtil.string(value.get("navn")),
ByteBufferUtil.toInt(value.get("revisjon")))
}.groupByKey()
Mohammed
-----Original Message-----
From: Martin Gammelsæter [mailto:[email protected]]
Sent: Wednesday, July 2, 2014 4:36 AM
To: [email protected]
Subject: How to use groupByKey and CqlPagingInputFormat
Hi!
Total Scala and Spark noob here with a few questions.
I am trying to modify a few of the examples in the spark repo to fit my needs,
but running into a few problems.
I am making an RDD from Cassandra, which I've finally gotten to work, and
trying to do some operations on it. Specifically I am trying to do a grouping
by key for future calculations.
I want the key to be the column "navn" from a certain column family, but I
don't think I understand the returned types. Why are two Maps returned, instead
of one? I'd think that you'd get a list of some kind with every row, where
every element in the list was a map from column name to the value. So my first
question is: What do these maps represent?
val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
classOf[CqlPagingInputFormat],
classOf[java.util.Map[String,ByteBuffer]],
classOf[java.util.Map[String,ByteBuffer]])
val navnrevmap = casRdd.map({
case (key, value) =>
(ByteBufferUtil.string(value.get("navn")),
ByteBufferUtil.toInt(value.get("revisjon"))
}).groupByKey()
The second question (probably stemming from my not understanding the first
question) is why am I not allowed to do a groupByKey in the above code? I
understand that the type does not have that function, but I'm unclear on what I
have to do to make it work.
--
Best regards,
Martin Gammelsæter