Re: number limit of map for spark

Zhiliang Zhu Mon, 21 Dec 2015 10:44:05 -0800

What is difference between repartition  / collect and   collapse ...Is collapse 
the same costly as collect or repartition ?
Thanks in advance ~


    On Tuesday, December 22, 2015 2:24 AM, Zhan Zhang <[email protected]> 
wrote:
 

 In what situation, you have such cases? If there is no shuffle, you can 
collapse all these functions into one, right? In the meantime, it is not 
recommended to collectall data to driver.
Thanks.
Zhan Zhang
On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu <[email protected]> wrote:

Dear All,
I need to iterator some job / rdd quite a lot of times, but just lost in the 
problem of spark only accept to call around 350 number of map before it meets 
one action Function , besides, dozens of action will obviously increase the run 
time.Is there any proper way ...
As tested, there is piece of codes as follows:
......
 83     int count = 0; 84     JavaRDD<Integer> dataSet = jsc.parallelize(list, 
1).cache(); //with only 1 partition  85     int m = 350; 86     
JavaRDD<Integer> r = dataSet.cache(); 87     JavaRDD<Integer> t = null; 88 89   
  for(int j=0; j < m; ++j) { //outer loop to temporarily convert the rdd r to t 
 90       if(null != t) { 91         r = t; 92       }            //inner loop 
to call map 350 times , if m is much more than 350 (for instance, around 400), 
then the job will throw exception message               "15/12/21 19:36:17 
ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.StackOverflowError java.lang.StackOverflowError") 93       for(int 
i=0; i < m; ++i) {  94       r = r.map(new Function<Integer, Integer>() { 95    
       @Override 96           public Integer call(Integer integer) { 97         
    double x = Math.random() * 2 - 1; 98             double y = Math.random() * 
2 - 1; 99             return (x * x + y * y < 1) ? 1 : 0;100           }101     
    });
104       }105106       List<Integer> lt = r.collect(); //then collect this rdd 
to get another rdd, however, dozens of action Function as collect is VERY MUCH 
COST107       t = jsc.parallelize(lt, 1).cache();108109     }110......
Thanks very much in advance!Zhiliang

Re: number limit of map for spark

Reply via email to