I am trying to create new RDD based on given PairRDD. I have a PairRDD with
few keys but each keys have large (about 100k) values. I want to somehow
repartition, make each `Iterable<v>` into RDD[v] so that I can further
apply map, reduce, sortBy etc effectively on those values. I am sensing
flatMapValues is my friend but want to check with other sparkens. This is
for real-time spark app. I have already tried collect() and computing all
measures in-memory of app server but trying to improve upon it.
This is what I try (psuedo)

    class ComputeMetrices{
        transient JavaSparkContext sparkContext;

        public Map<String, V> computeMetrices(JavaPairRdd javaPairRdd) {

          javaPairRdd.groupByKey(10).mapValues(itr => {
          sparContext.parallelize(list(itr)) //null pointer ; probably at
sparkContext
          })
        }
    }

I want to create RDD out of that Iterable from groupByKey result so that I
can user further spark transformations.

Thanks
Nir

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

[image: Facebook] <http://www.facebook.com/XactlyCorp>  [image: LinkedIn] 
<http://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/xactly>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to