You might want to avoid that unionAll(), which seems to be repeated over 1000 times. Could you do a collect() in each iteration, and collect your results in a local Array instead of a DataFrame? How many rows are returned in "temp1"?
Xinh On Tue, Mar 8, 2016 at 10:00 PM, Angel Angel <areyouange...@gmail.com> wrote: > Hello Sir/Madam, > > > I writing the spark application in spark 1.4.0. > > I have one text file with the size of 8 GB. > I save that file in parquet format > > > val df2 = > sc.textFile("/root/Desktop/database_200/database_200.txt").map(_.split(",")).map(p > => Table(p(0),p(1).trim.toInt, p(2).trim.toInt, p(3)))toDF > > > df2.write.parquet("hdfs://hadoopm0:8020/tmp/input1/database4.parquet") > > After that i did the following operations > > > val df1 = > sqlContext.read.parquet("dfs://hadoopm0:8020/tmp/input1/database4.parquet") > > > var a=0 > > var k = df1.filter(df1("Address").equalTo(Array_Ele(0) )) > > > for( a <-2 until 2720 by 2){ > > > var temp= df1.filter(df1("Address").equalTo(Array_Ele(a))) > > > var temp1 = > temp.select(temp("Address"),temp("Couple_time")-Array_Ele(a+1),temp("WT_ID"),temp("WT_Name")) > > > k =k.unionAll(temp1) } > > > val WT_ID_Sort = k.groupBy("WT_ID").count().sort(desc("count")) > > > > WT_ID_Sort.show() > > > > after that I am getting the following warning and my task is disconnected > again and again. > > > > > > [image: Inline image 1] > > > > > I need to do many iterative operations on that df1 file. > > > So can any one help me to solve this problem? > > thanks in advance. > > > Thanks. > > >