Hi users,
I’m fresh to RDD Programming, my problem as the title, what I do is to
read a source file through sc, then do a groupByKey get new RDD,now I want to
do the other groupByKey based on the former RDD’s every element.
for example,my source file as follow:
hello_world,1
hello,1
hello_world_spark,3
hello_scala,4
spark_rdd,1
spark_rdd_program,1
spark,1
spark_sql,3
after my first round groupbykey, I get an RDD like this:
hello,((world,1),(world_spark,3),(scala,4))
spark,((rdd,1),(rdd_program,1),(sql,3))
the next step is what my problem ,I want to groupbykey on the values’
content like “world/rdd/scala/sql”, it seems I need group by every element’s
value,but spark does not support nested RDDs, so what can I do to solve it ?
actually, what I do is to building a tree, every node is a word in a
sentence,the root node is null, in my example , two children of root node is
“hello” and “spark”,and hello also has 2 children(world and scala), spark also
has two children(rdd and sql)
For Help Please.
Thanks every every every much.
Mars.
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]