Re: GroupBy on DataFrame taking too much time

Xingchi Wang Mon, 11 Jan 2016 00:11:59 -0800

Error happend at the "Lost task 0.0 in stage 0.0", I think it is not the
"groupBy" problem, it's the sql read the "customer" table issue,
please check the jdbc link and the data is loaded successfully??


Thanks
Xingchi

2016-01-11 15:43 GMT+08:00 Gaini Rajeshwar <raja.rajeshwar2...@gmail.com>:

> Hi All,
>
> I have a table named *customer *(customer_id, event, country, .... ) in
> postgreSQL database. This table is having more than 100 million rows.
>
> I want to know number of events from each country. To achieve that i am
> doing groupBY using spark as following.
>
> *val dataframe1 = sqlContext.load("jdbc", Map("url" ->
> "jdbc:postgresql://localhost/customerlogs?user=postgres&password=postgres",
> "dbtable" -> "customer"))*
>
>
> *dataframe1.groupBy("country").count().show()*
>
> above code seems to be getting complete customer table before doing
> groupBy. Because of that reason it is throwing the following error
>
> *16/01/11 12:49:04 WARN HeartbeatReceiver: Removing executor 0 with no
> recent heartbeats: 170758 ms exceeds timeout 120000 ms*
> *16/01/11 12:49:04 ERROR TaskSchedulerImpl: Lost executor 0 on 10.2.12.59
> <http://10.2.12.59>: Executor heartbeat timed out after 170758 ms*
> *16/01/11 12:49:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,
> 10.2.12.59): ExecutorLostFailure (executor 0 exited caused by one of the
> running tasks) Reason: Executor heartbeat timed out after 170758 ms*
>
> I am using spark 1.6.0
>
> Is there anyway i can solve this ?
>
> Thanks,
> Rajeshwar Gaini.
>

Re: GroupBy on DataFrame taking too much time

Reply via email to