Hi Chirag, Maybe something like this?
import org.apache.spark.sql._ import org.apache.spark.sql.types._ val rdd = sc.parallelize(Seq( Row("A1", "B1", "C1"), Row("A2", "B2", "C2"), Row("A3", "B3", "C2"), Row("A1", "B1", "C1") )) val schema = StructType(Seq("a", "b", "c").map(c => StructField(c, StringType))) val df = sqlContext.createDataFrame(rdd, schema) df.registerTempTable("rows") sqlContext.sql("select a, b, c, count(0) as count from rows group by a, b, c").collect() Eric On Thu, Sep 10, 2015 at 2:19 AM, Chirag Dewan <chirag.de...@ericsson.com> wrote: > Hi, > > > > I am using Spark 1.2.0 with Cassandra 2.0.14. I have a problem where I > need a count of rows unique to multiple columns. > > > > So I have a column family with 3 columns i.e. a,b,c and for each value of > distinct a1,b1,c1 I want the row count. > > > > For eg: > > A1,B1,C1 > > A2,B2,C2 > > A3,B3,C2 > > A1,B1,C1 > > > > The output should be: > > A1,B1,C1,2 > > A2,B2,C2,1 > > A3,B3,C3,1 > > > > What is the optimum way of achieving this? > > > > Thanks in advance. > > > > Chirag >