Re: word count (group by users) in spark

Aniket Bhatnagar Sat, 19 Sep 2015 09:06:11 -0700

Using scala API, you can first group by user and then use combineByKey.

Thanks,
Aniket


On Sat, Sep 19, 2015, 6:41 PM kali.tumm...@gmail.com <kali.tumm...@gmail.com>
wrote:

> Hi All,
> I would like to achieve this below output using spark , I managed to write
> in Hive and call it in spark but not in just spark (scala), how to group
> word counts on particular user (column) for example.
> Imagine users and their given tweets I want to do word count based on user
> name.
>
> Input:-
> kali    A,B,A,B,B
> james B,A,A,A,B
>
> Output:-
> kali A [Count] B [Count]
> James A [Count] B [Count]
>
> My Hive Answer:-
> CREATE EXTERNAL TABLE  TEST
> (
>      user_name                     string                   ,
>      COMMENTS                      STRING
>
> )  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001'  STORED AS TEXTFILE
> LOCATION '/data/kali/test';  ----  HDFS FOLDER (create hdfs folder and
> create a text file with data mentioned in the email)
>
> use default;select user_name,COLLECT_SET(text) from (select
> user_name,concat(sub,' ',count(comments)) as text  from test LATERAL VIEW
> explode(split(comments,',')) subView AS sub group by user_name,sub)w group
> by user_name;
>
> Spark With Hive:-
> package com.examples
>
> /**
>  * Created by kalit_000 on 17/09/2015.
>  */
> import org.apache.log4j.Logger
> import org.apache.log4j.Level
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.{SparkContext, SparkConf}
> import org.apache.spark.SparkContext._
>
>
> object HiveWordCount {
>
>   def main(args: Array[String]): Unit =
>   {
>     Logger.getLogger("org").setLevel(Level.WARN)
>     Logger.getLogger("akka").setLevel(Level.WARN)
>
>     val conf = new
>
> SparkConf().setMaster("local").setAppName("HiveWordCount").set("spark.executor.memory",
> "1g")
>     val sc = new SparkContext(conf)
>     val sqlContext= new SQLContext(sc)
>
>     val hc=new HiveContext(sc)
>
>     hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS default.TEST  (user_name
> string ,COMMENTS STRING )ROW FORMAT DELIMITED FIELDS TERMINATED BY '001'
> STORED AS TEXTFILE LOCATION '/data/kali/test' ")
>
>     val op=hc.sql("select user_name,COLLECT_SET(text) from (select
> user_name,concat(sub,' ',count(comments)) as text  from default.test
> LATERAL
> VIEW explode(split(comments,',')) subView AS sub group by user_name,sub)w
> group by user_name")
>
>     op.collect.foreach(println)
>
>
>   }
>
>
>
>
> Thanks
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/word-count-group-by-users-in-spark-tp24748.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: word count (group by users) in spark

Reply via email to