Re: reading the parquet file

Xinh Huynh Wed, 09 Mar 2016 09:34:25 -0800

You might want to avoid that unionAll(), which seems to be repeated over
1000 times. Could you do a collect() in each iteration, and collect your
results in a local Array instead of a DataFrame? How many rows are returned
in "temp1"?


Xinh

On Tue, Mar 8, 2016 at 10:00 PM, Angel Angel <areyouange...@gmail.com>
wrote:

> Hello Sir/Madam,
>
>
> I writing the spark application in spark 1.4.0.
>
> I have one text file with the size of 8 GB.
> I save that file in parquet format
>
>
> val df2 =
> sc.textFile("/root/Desktop/database_200/database_200.txt").map(_.split(",")).map(p
> => Table(p(0),p(1).trim.toInt, p(2).trim.toInt, p(3)))toDF
>
>
> df2.write.parquet("hdfs://hadoopm0:8020/tmp/input1/database4.parquet")
>
> After that i did the following operations
>
>
> val df1 =
> sqlContext.read.parquet("dfs://hadoopm0:8020/tmp/input1/database4.parquet")
>
>
> var a=0
>
> var k = df1.filter(df1("Address").equalTo(Array_Ele(0) ))
>
>
> for( a <-2 until 2720 by 2){
>
>
> var temp= df1.filter(df1("Address").equalTo(Array_Ele(a)))
>
>
> var temp1 =
> temp.select(temp("Address"),temp("Couple_time")-Array_Ele(a+1),temp("WT_ID"),temp("WT_Name"))
>
>
> k =k.unionAll(temp1) }
>
>
> val WT_ID_Sort  = k.groupBy("WT_ID").count().sort(desc("count"))
>
>
>
> WT_ID_Sort.show()
>
>
>
> after that I am getting the following warning and my task is disconnected
> again and again.
>
>
>
>
>
> [image: Inline image 1]
>
>
>
>
> I need to do many iterative operations on that df1 file.
>
>
> So can any one help me to solve this problem?
>
> thanks in advance.
>
>
> Thanks.
>
>
>

Re: reading the parquet file

Reply via email to