Hi Jerry, It does not work directly for 2 reasons:
1. I am trying to do this using Spark Streaming (Window DStreams) and DataFrames API does not work with Streaming yet. 2. The query equivalent has a "distinct" embedded in it i.e. I am looking to achieve the equivalent of SELECT key, count(distinct(value)) from table group by key Thanks Nikunj On Sun, Jul 19, 2015 at 2:28 PM, Jerry Lam <chiling...@gmail.com> wrote: > You mean this does not work? > > SELECT key, count(value) from table group by key > > > > On Sun, Jul 19, 2015 at 2:28 PM, N B <nb.nos...@gmail.com> wrote: > >> Hello, >> >> How do I go about performing the equivalent of the following SQL clause >> in Spark Streaming? I will be using this on a Windowed DStream. >> >> SELECT key, count(distinct(value)) from table group by key; >> >> so for example, given the following dataset in the table: >> >> key | value >> -----+------- >> k1 | v1 >> k1 | v1 >> k1 | v2 >> k1 | v3 >> k1 | v3 >> k2 | vv1 >> k2 | vv1 >> k2 | vv2 >> k2 | vv2 >> k2 | vv2 >> k3 | vvv1 >> k3 | vvv1 >> >> the result will be: >> >> key | count >> -----+------- >> k1 | 3 >> k2 | 2 >> k3 | 1 >> >> Thanks >> Nikunj >> >> >