Hi Chirag,

Maybe something like this?

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val rdd = sc.parallelize(Seq(
  Row("A1", "B1", "C1"),
  Row("A2", "B2", "C2"),
  Row("A3", "B3", "C2"),
  Row("A1", "B1", "C1")
))

val schema = StructType(Seq("a", "b", "c").map(c => StructField(c, StringType)))
val df = sqlContext.createDataFrame(rdd, schema)

df.registerTempTable("rows")
sqlContext.sql("select a, b, c, count(0) as count from rows group by
a, b, c").collect()


Eric


On Thu, Sep 10, 2015 at 2:19 AM, Chirag Dewan <chirag.de...@ericsson.com>
wrote:

> Hi,
>
>
>
> I am using Spark 1.2.0 with Cassandra 2.0.14. I have a problem where I
> need a count of rows unique to multiple columns.
>
>
>
> So I have a column family with 3 columns i.e. a,b,c and for each value of
> distinct a1,b1,c1 I want the row count.
>
>
>
> For eg:
>
> A1,B1,C1
>
> A2,B2,C2
>
> A3,B3,C2
>
> A1,B1,C1
>
>
>
> The output should be:
>
> A1,B1,C1,2
>
> A2,B2,C2,1
>
> A3,B3,C3,1
>
>
>
> What is the optimum way of achieving this?
>
>
>
> Thanks in advance.
>
>
>
> Chirag
>

Reply via email to